How we measure what AI recommends
Every page in this section reports what four major AI models said when asked a specific buyer question. This page documents how the measurement is performed, the sample size, and the limits of the method.
The procedure
For each category, we run one fixed question against four AI models: ChatGPT, Gemini, Claude, Perplexity. The question is identical across models. No system prompt or context is attached.
Each model's response is stored verbatim. We extract the brand names mentioned, the order in which they appeared, and any sources cited by the model where the model exposes them (Perplexity and Claude provide citations; ChatGPT and Gemini expose them inconsistently).
Measurements are refreshed quarterly. The "measured on" date on each page reflects the most recent run.
Honest limitations
Sample size is small
One question per category per quarter. AI responses vary substantially across runs: SparkToro's research found less than a 1-in-100 chance that two identical queries return the same brand list. A single run is a snapshot, not a definitive ranking.
Sentiment is not LLM-validated
Where we display sentiment, it is heuristic and based on keyword patterns in the surrounding 200 to 500 characters. We do not re-score responses with a separate LLM for sentiment. Read sentiment columns as directional.
Citations are model-dependent
Perplexity and Claude expose their citation sources. ChatGPT and Gemini do so inconsistently. When citations are missing, it means the model did not return them, not that they do not exist.
Brand extraction is rule-based
We parse model responses for capitalised proper nouns that match a maintained alias list. Edge cases (newly launched brands, ambiguous names) may be missed. We update the alias list each refresh cycle.
This is an observatory, not a buyer's guide
These pages show what AI says about each category at one point in time. They are not buying recommendations. Use them to understand AI recommendation patterns, not to choose vendors.
Scope
We currently track 25 category questions across SaaS, AI visibility, SEO, content, PR, e-commerce, and agency tooling. Questions are picked because they are common buyer queries in Honeyb's audience.
We expand the question set when a category reaches a clear gap. Suggest a category by emailing us, or run your own brand against the same models using our free AI visibility checker.
Why publish this at all
Two reasons.
First, AI recommendation has become a real discovery channel. 58 percent of consumers have replaced traditional search with AI for product research according to Capgemini's 2025 data. Buyers are reading these answers and making decisions from them. Showing what AI actually says, transparently, helps everyone understand the channel better.
Second, Honeyb runs this kind of measurement professionally for customers, at much higher sample sizes, across full prompt sets, with proper sentiment validation. The pages here are an honest, public-facing version of the same discipline.
See your own brand the same way
Run the same kind of measurement on your brand across every major AI model. Free, instant, no signup required.