When you ask ChatGPT, Gemini or Perplexity a question, the engine often answers with named sources attached. Those sources are AI citations: the specific pages an engine pulled from to construct that particular answer. For brands, earning them is the new equivalent of ranking, because a citation is the moment a model decides your page is worth quoting. The honest complication is that there is no single lever to pull. Each engine retrieves and cites differently, citations rarely overlap between them, and the same query can produce different sources from one day to the next. This guide explains how AI citations actually work, which sources engines lean on, and what the research shows you can do to get cited by AI. The short version: you do not optimise once for everything, you optimise per engine and you measure the result.
What an AI citation actually is
Modern answer engines use retrieval-augmented generation, usually shortened to RAG. Rather than answering purely from what the model memorised in training, the engine runs a live search at query time, retrieves a set of passages, synthesises them into an answer, and attaches the sources it drew from. A citation, then, is the engine showing its work for one specific answer. This is a meaningful difference from a traditional blue-link ranking. A Google ranking is a relatively stable position earned by a page over time. An AI citation is a per-answer decision, assembled fresh, that can change when the query changes, when the index updates, or when the model is asked the same thing twice. Understanding that distinction matters, because it explains why being cited once tells you very little about whether you will be cited again.
How each engine decides what to cite
The engines do not share a retrieval system, and it shows. Gemini is grounded in Google's search index, so it tends to surface the sources Google already trusts, which skews toward official brand sites and established publishers. Perplexity behaves more like a search engine that answers directly, pulling and citing many sources per claim. ChatGPT uses its own retrieval layer, and the mix of sources it favours varies considerably by industry and query type. According to Yext's analysis of how the major engines decide what to cite, Claude cites user-generated content at roughly two to four times the rate of the other models, while structured and verified data was the single largest category of distinct citation sources at 54.53 percent. The practical takeaway is that the same page can be a strong fit for one engine and invisible to another. If you want a deeper read on the signals behind these decisions, our breakdown of how AI models choose which brands to recommend covers the ranking factors that recur across engines.
The uncomfortable truth: citations barely overlap
The biggest misconception about getting cited by AI is that one well-optimised page will be picked up everywhere. The data says otherwise. SurfacedBy analysed 127,198 citations across ChatGPT, Claude, Gemini, Perplexity and Google AI Mode drawn from 16,400 answers between late March and late June 2026. They found that 69.6 percent of cited domains appeared in only one engine, and just 2.7 percent, 309 domains in total, were cited by all five. The top 10 domains accounted for 20.6 percent of all citations and the top 100 for 42 percent, while nearly 43 percent of domains were cited exactly once. The same study found wildly different citation volumes per answer: Gemini averaged 11.0 sources, Perplexity 8.6, Google AI Mode 7.8, Claude 6.8 and ChatGPT just 3.7. That last figure is worth dwelling on. ChatGPT is the most selective of the major engines, which makes a slot there harder to earn and more valuable when you do. None of this is a single problem to solve once. It is a set of overlapping problems, one per engine, that you have to monitor separately.
A side-by-side view of the engines
The table below summarises the structural differences that shape what each engine cites. The source-count figures come from the SurfacedBy study cited above, and the grounding descriptions from Yext's analysis. Treat the averages as directional rather than fixed, because they shift with query type and over time.
Engine | Grounding source | Avg sources per answer | What it tends to favour --- | --- | --- | --- ChatGPT | Own retrieval layer, varies by industry | 3.7 | Selective mix, third-party validation Gemini | Google search index | 11.0 | Official brand sites, established publishers Perplexity | Direct web search, many cites per claim | 8.6 | Breadth of sources, broad coverage Claude | Own retrieval layer | 6.8 | User-generated content at 2 to 4x other models Google AI Mode | Google search index | 7.8 | Index-trusted pages, structured data
Want to see this in action?
See how every major AI model talks about your brand. Free to start.
Which sources AI engines cite most
Knowing where engines actually look helps you decide where to invest. Similarweb analysed roughly 600,000 US citation events across January and February 2026 and found ChatGPT's most-cited domains were Wikipedia at 13.15 percent, Reddit at 11.97 percent, OpenAI's own properties at 6.21 percent and Walmart at 2.90 percent. For Google AI Mode the leaders were Fandom at 7.16 percent, Wikipedia at 5.21 percent, YouTube at 4.91 percent and Reddit at 4.19 percent. The SurfacedBy and Similarweb studies disagree on the exact magnitude of Reddit and Wikipedia's share, which is expected given different methods and dates, so it is safer to read these as a range than as fixed truth. The pattern that holds across both is more useful anyway: even the most-cited domain rarely tops the low teens, and the long tail of single-cited domains dominates. You do not need to be Wikipedia. You need to be present in the sources engines already trust for your niche, which often means community threads, review platforms, authoritative lists and your own well-structured pages. Our piece on why AI models cite Reddit more than your website explains why community signals carry so much weight here.
How to earn citations: what the research shows works
The most rigorous public evidence comes from a 2024 study by researchers at Princeton, Georgia Tech and IIT Delhi, presented at KDD and published as GEO: Generative Engine Optimization. They tested specific content changes against a benchmark of real generative-engine queries and found that the right methods lifted a source's visibility in AI answers by up to 40 percent. The highest-impact changes were adding relevant statistics, citing your own sources, and including direct quotations from credible voices. Notably, keyword stuffing performed worse than the baseline, which is a clean reversal of an old SEO instinct. The mechanism behind these results is intuitive once you see it. An engine assembling an answer is looking for clean, attributable passages it can lift and credit. Content that states a fact, backs it with a number, and points to a source gives the model exactly that. Two practical implications follow. First, make your content extractable: clear claims, supporting data, and structure a model can parse. Structured data reinforces this, which is why schema is worth getting right, and our guide to whether schema markup helps with AI visibility covers what the evidence actually supports. Second, build presence in third-party sources, because a brand mentioned across trusted reviews, lists and discussions has more places for an engine to find and cite it than a brand that exists only on its own homepage.
Citation is not a click, and not always accurate
It is worth being clear-eyed about what a citation buys you. Being cited still beats not being cited: Seer Interactive's April 2026 click data, reported by Search Engine Journal, found that brands cited in AI Overviews earned roughly 120 percent more organic clicks per impression than non-cited brands, with a cited click-through rate of 0.70 percent against 0.52 percent. But the absolute numbers are sobering, and Ahrefs found AI Overviews reduce clicks by 34.5 percent overall, because most users read the answer without clicking any source. So a citation is more about presence and influence inside the answer than about traffic. There is also an accuracy problem. The Tow Center for Digital Journalism at Columbia tested 1,600 queries across eight engines in March 2025 and found the engines returned incorrect source information more than 60 percent of the time, with Perplexity wrong on 37 percent of queries and Grok-3 wrong on 94 percent, as Nieman Lab summarised. Engines fabricated links and credited syndicated copies instead of originals. That study predates the current model generation, but it remains the canonical reference, and the implication still stands: even when you earn a citation, the attribution can be wrong, which is one more reason to watch the answers rather than assume them.
How to know if it is working
Everything above leads to the same practical conclusion. Because citations are per-answer, per-engine, volatile and sometimes misattributed, you cannot judge your progress from a single check. Asking ChatGPT about your brand once is a snapshot, not a measurement, and AI recommendations can change roughly 70 percent of the time for the same query, as we documented in why spot-checking your AI visibility doesn't work. The only reliable way to know whether your work is earning citations is to track them across multiple engines, on a repeating cadence, over time. That is the core idea behind AI-visibility monitoring: you cannot improve what you cannot measure, and a single engine on a single day measures almost nothing. Earning citations is the work. Watching them across ChatGPT, Gemini, Perplexity and the rest is how you find out whether the work is paying off, and where to spend the next effort.





