AI search works as a five-step pipeline. The engine interprets your question, decides whether it can answer from its training data or needs fresh information, runs several searches against live indexes, synthesises the retrieved pages into a written answer, and attaches citations to a subset of the sources it used. The whole sequence usually completes in seconds. Every step filters something out, which is why two engines can take the same question and return different answers built on different sources.
This post walks the pipeline end to end, from the moment you press enter to the moment a cited answer appears. For the broader definition and history, start with the pillar on what AI search is. For how this differs from a classic results page, see AI search vs traditional search. And if you want a glossary of the underlying components rather than the process, that lives in the key technologies behind AI search. Here we follow the question through the machine.
The pipeline at a glance
Different engines implement the details differently, but the same five steps sit underneath ChatGPT, Perplexity, Google's AI Mode, Copilot, and the rest.
| Step | What happens |
|---|---|
| 1. Understand the question | The engine parses intent and breaks the question into smaller sub-queries |
| 2. Decide whether to search | It answers from training data alone or triggers live retrieval |
| 3. Retrieve sources | Sub-queries fan out across search indexes and candidate pages are fetched and ranked |
| 4. Synthesise the answer | The language model writes a response grounded in the retrieved text |
| 5. Cite and present | A subset of retrieved sources is selected, linked and formatted into the final answer |
The first two steps are invisible to you. Step 3 sometimes surfaces as a brief searching-the-web indicator. Steps 4 and 5 are what you actually see: text streaming in, followed by source links. The interesting work, and most of the failure modes, happen before a single word is written.
Step 1: understanding the question
Traditional search engines reduce your query to keywords. AI search engines do the opposite: they read the whole question and try to work out what you actually want. That starts with intent classification. Is this a fact lookup, a comparison, a recommendation request, a how-to? The classification shapes everything downstream, from whether the engine searches at all to how the final answer is formatted.
The second part is decomposition. A question like "best CRM for a five-person consultancy" is too specific to match any single page, so the engine breaks it into sub-queries it can actually search: CRM for small teams, CRM pricing comparison, lightweight CRM reviews. Conversation context feeds in too. If you follow up with "what about the cheapest one", the engine carries the CRM thread forward instead of treating the message as a brand-new question. From one question, the engine typically extracts four things:
- The core task: what the answer needs to accomplish
- Constraints: budget, team size, location, technical requirements stated or implied
- Format expectations: a comparison wants a table, a how-to wants steps
- Freshness: whether the question only makes sense with current information
Step 2: deciding whether to search
A language model already knows a great deal from training. Ask what photosynthesis is and it can answer accurately without touching the web, because the explanation has not changed in decades. Running retrieval for that question would add latency and cost for no gain. So before searching, the engine makes a judgement call: is stored knowledge sufficient and current enough for this?
Recency is the forcing function. Training data has a cutoff date, and anything published after it simply does not exist inside the model. Prices change, products launch, companies rebrand, rankings shift. Any question containing "latest", "current", or "best in 2026" implicitly demands information the model cannot guarantee from memory. Questions about specific companies, local services, or niche products also trigger retrieval, because training data covers them thinly and unevenly.
Engines set this dial differently. Perplexity treats retrieval as the default and searches for almost everything. ChatGPT decides per query, answering some questions from memory and searching for others. Google's AI Mode lives inside Search itself, so retrieval is the natural posture. Where the dial sits matters more than it sounds, because a wrong call here produces a fluent answer built entirely on stale knowledge.
Step 3: retrieving sources
Now the sub-queries from step 1 fan out. Instead of running one search, the engine fires several in parallel, each targeting a different facet of the question. A recommendation query might spawn one search for product comparisons, one for reviews, one for pricing, and one for recent news about the category. This is why a single AI answer can draw on sources that no single traditional search would have surfaced together.
Each search runs against an index. Which index depends on the engine: Copilot is grounded in Bing, Gemini and AI Mode draw on Google's index, Brave runs its own independent index, and others blend their own crawls with partner data. This is the step where classic search rankings still earn their keep. A page that ranks nowhere in the underlying index never enters the candidate pool, no matter how good it is.
The engine then fetches the candidate pages, splits them into passages, and scores each passage for semantic relevance to the question rather than keyword overlap. Only the best passages survive, because the model's working context is finite and the engine cannot paste the entire web into it. The synthesis that follows works from this shortlist, not from the open web.
Step 4: synthesising the answer
Synthesis is where the language model finally writes. It receives your question plus the surviving passages and is instructed to answer using them. The technique is called grounding: anchoring the model's output to specific retrieved text instead of letting it generate freely from training data. Grounding is the difference between an AI search engine and a bare chatbot.
Writing the answer involves real editorial judgement. Sources disagree, so the model has to weigh them, favour the more credible or more recent claim, and sometimes note the disagreement outright. It also chooses structure. A comparison becomes a table, a process becomes steps, a simple fact becomes two sentences. The best engines match the shape of the answer to the shape of the question.
Grounding reduces fabrication but does not eliminate it. If retrieval came back thin, the model fills the gaps from its own parameters, and it fills them in the same confident register it uses for well-sourced claims. Nothing in the prose signals which sentences rest on retrieved evidence and which were generated from memory. That distinction matters in the next step.
Want to see this in action?
Check how AI models talk about your brand — free, instant, no signup required.
Step 5: citing and presenting
Finally the engine decides what to show. It selects which sources get cited, where the citation markers sit in the text, and how the answer is presented: inline footnotes, source cards, suggested follow-ups. The selection is a genuine cut. Ahrefs research found that ChatGPT cites only around 50% of the URLs it actually retrieves. Half the pages that informed an answer can receive no visible credit at all.
Presentation differs by engine. Perplexity numbers its citations inline and lists sources prominently. ChatGPT tends to attach a smaller set of links. Google's AI Mode blends the answer with familiar search furniture. Whatever the layout, what you see is the survivor of four earlier rounds of filtering.

Where the pipeline goes wrong
A five-step pipeline has five places to fail, and the failures compound because each step trusts the output of the one before it. Three patterns account for most bad answers.
Confident errors. When step 2 wrongly skips retrieval, or step 3 returns thin results, the model writes anyway. It does not lower its tone to match its evidence. Worse, a citation can sit beside a sentence the source does not actually support, because citation placement is itself a model judgement. The answer looks verified while parts of it are not.
Stale and unstable retrieval. Indexes lag the live web, so retrieval can serve a snapshot of a page that has since changed. The outputs are volatile too. Ahrefs measured Google's AI Overviews changing every 2.15 days on average, with 70% of the content drifting between versions. The answer you saw on Monday is not the answer your customer sees on Thursday.
Source narrowing. Fan-out sounds broad, but the ranking layers converge on familiar territory: high-authority domains and formats the engines have learned to trust. In ChatGPT's case, the same Ahrefs research found that 43.8% of cited pages are "Best X" listicles. A pipeline that starts by reading the whole web ends by quoting a surprisingly small slice of it.
Why each step matters for brands
Each step in the pipeline is a filter, and your brand has to survive all five to appear in the answer a buyer reads. Most brands fall out somewhere in the middle without anyone noticing, because the pipeline reports nothing about what it discarded.
- Step 1 sets the vocabulary. If buyers phrase the category differently from your own copy, the sub-queries may never point at pages that mention you
- Step 2 decides whether your current positioning gets a hearing. No retrieval means the model answers from training impressions that can be years old
- Step 3 requires presence in the underlying indexes. A page that is not retrieved cannot be synthesised
- Step 4 rewards pages a model can quote cleanly: clear claims, concrete specifics, scannable structure
- Step 5 is the final cut. Being retrieved is not being cited, and being cited is not being named in the answer text
What tips the model towards naming one brand over another at synthesis time is a topic of its own, covered in how AI models choose which brands to recommend. The short version: third-party evidence weighs heavily, because the pipeline treats independent sources as more quotable than your own marketing pages.
This filtering is also why one-off manual checks mislead. An answer is the output of a volatile pipeline, not a stable ranking. Honeyb runs the same buyer prompts across ChatGPT, Gemini, Claude, Perplexity, and Google's AI surfaces on a schedule, then tracks which brands and sources each engine names over time, so you can see whether you are falling out at retrieval or at the citation cut.
If you want a snapshot of where your own brand exits the pipeline, run a free AI visibility check. It shows whether the main engines mention you for your category, and who they cite instead.
Frequently asked questions
Does AI search use Google's index? Some engines do. Google's AI Mode and Gemini draw on Google's index, Microsoft Copilot is grounded in Bing, and Brave Search runs its own independent index. The practical implication is that retrieval always runs against some underlying index, so the engines you care about determine which indexes your pages need to rank in.
How long does the whole pipeline take? A few seconds for most questions. The sub-query fan-out runs in parallel, and most engines stream the answer as the model writes it rather than waiting for the full text. Deeper research modes that read far more sources can take several minutes, which is the trade for more thorough retrieval.
Can an answer be wrong even when it shows citations? Yes. Citations show which sources the engine chose to display, not that every sentence was checked against them. If retrieval came back thin, the model fills gaps from training data in the same confident tone, and a citation can sit next to a claim the source never makes. Treat cited answers as a strong starting point, not a verdict.
Is AI search just a chatbot connected to the internet? No. A chatbot generates from training data alone. AI search adds question decomposition, a retrieval layer that searches live indexes, grounding that anchors the answer to retrieved text, and a citation layer that exposes sources. Those middle steps are what make the output checkable, and they are where most of the engineering effort sits.




