Ask Perplexity a question and it does something a chatbot never does: it runs a live web search, reads the results, and writes a direct answer with numbered footnotes back to the exact pages it used. The footnotes are not decoration. They are the product. Perplexity is built so that the language model is not supposed to assert anything it did not first retrieve, and that single constraint is what separates an answer engine from a text generator that confidently invents. Understanding the pipeline behind those footnotes explains both why Perplexity stays current and where it still gets things wrong.
This guide walks through that pipeline one stage at a time, answers the question everyone asks first (does Perplexity use ChatGPT?), covers Pro Search and the heavily rebuilt Deep Research mode, and sets out the limits worth knowing before you rely on it. If you want the wider category first, our explainer on how AI search works covers the landscape.
The short answer
Perplexity works by combining a search engine with a large language model. When you ask a question it does not answer from memory the way a standalone chatbot does. It runs a real-time web search, retrieves and ranks candidate sources, then passes the most relevant passages to a language model that synthesises an answer constrained by what it retrieved. The technical name for this design is retrieval-augmented generation, or RAG. Every claim is meant to trace back to a cited source, which is why the inline footnotes are the defining feature of the interface.

Perplexity's own engineering philosophy puts retrieval first. As the company frames it, the goal is to solve search first, then use it to solve everything else. The model is downstream of the search system, not the other way round, and that ordering shapes every design decision that follows.
The retrieval-augmented pipeline, step by step
A single query moves through several discrete stages before you see an answer. The sequence below reflects how Perplexity and its infrastructure partners describe the flow.
- 1. Query understanding. A language model parses your question to work out intent, the entities involved, and what kind of answer is needed. A factual lookup, a comparison, and an open research question each trigger different search strategies.
- 2. Live web search. Rather than relying on a fixed training cutoff, Perplexity issues real-time searches against its web index. For complex questions it may run several searches in parallel, each targeting a different facet of the query.
- 3. Source retrieval. Candidate documents come back as a large pool of pages, articles, and data snippets. Retrieval is hybrid: it blends lexical matching (traditional keyword scoring) with dense vector embeddings that capture semantic meaning, so a page can match on wording or on concept.
- 4. Ranking and filtering. The candidate pool is narrowed through multiple ranking stages. Early stages use fast lexical and embedding scorers; later stages apply learned models and cross-encoders that weigh relevance, freshness, and source quality. Only the strongest passages survive.
- 5. Answer synthesis. The top-ranked passages are assembled into a structured prompt and handed to a language model, which writes the answer using that evidence. Because the source passages carry identifiers, the model can attach citations as it writes.
- 6. Inline citation. The finished answer is returned with numbered footnotes, each linking to the page it drew from, so you can verify any claim directly.
Perplexity runs the retrieval and ranking layers on Vespa, a search platform built for exactly this kind of hybrid, multi-stage workload. Vespa's engineering write-up describes how the system "fuses lexical, vector, and metadata signals in a unified ranking pipeline" and supports chunk-level retrieval, "treating both documents and their internal sections as retrievable units." That last detail matters more than it sounds. By retrieving sections rather than whole pages, Perplexity can hand the model only the most relevant spans, which it says "improves factual accuracy, reduces context length, and minimises compute cost." You can read the technical detail in Vespa's account of the system. For the building blocks in general, our piece on the key technologies behind AI search covers embeddings, vector search, and rerankers.
Does Perplexity use ChatGPT?
This is the most common question about Perplexity, and the answer is: partly, but not exclusively. Perplexity is deliberately model-agnostic. Its architecture routes queries to a mix of language models rather than depending on any single one. As one neutral technical breakdown puts it, the system "leverages a heterogeneous mix of models, including in-house fine-tuned models from the 'Sonar' family and third-party frontier models from leading labs like OpenAI (GPT series) and Anthropic (Claude series)." The detail is set out in ByteByteGo's analysis of the architecture.
In practice the default fast experience usually runs on Sonar, Perplexity's own model. Sonar is built on a Meta Llama base (Llama 3.3 70B) and fine-tuned for fast, factual, citation-grounded answers. It runs on Cerebras inference hardware, which is what makes the default mode feel near-instant: the partnership announcement states the setup processes roughly 1,200 tokens per second. Perplexity puts that at nearly ten times the decoding throughput of a comparable model such as Gemini 2.0 Flash. The joint announcement is in Cerebras's write-up on powering Sonar.
Paying users can override that default and pick the model behind their answers. As of mid-2026 the Pro picker spans Sonar plus current frontier releases including GPT-5.x, Claude Opus 4.5, and Gemini 3 Pro, and a Model Council mode that runs the same query across several frontier models at once and reconciles their answers. The roster turns over quickly, so treat any specific list as a snapshot rather than a fixed menu.
So Perplexity can use the same underlying models as ChatGPT, but it is not ChatGPT. The product is the orchestration layer around those models: the search, retrieval, ranking, and citation system. That is the part that does not change when you swap the model.
Why the model-agnostic approach matters
Treating the language model as a swappable component has practical consequences. It lets Perplexity match the model to the task: a quick fact can go to a fast, cheap model while a dense analytical question can go to a frontier reasoning model. It avoids dependence on any one provider. And it means answer quality is governed as much by retrieval and ranking quality as by the raw model. A weaker model with excellent, fresh sources will often beat a stronger model working from stale or thin evidence. This is the same dynamic that decides which brands surface in answers, which we unpack in how AI models choose which brands to recommend.
Monthly searches (US)
Search demand for "perplexity ai"
Want to see this in action?
Check how AI models talk about your brand — free, instant, no signup required.
Demand for Perplexity has grown alongside the broader shift towards AI search. The chart above tracks US search interest in the brand, and it sits within the wider category move from links to answers that the rest of this guide unpacks.
Pro Search and Deep Research modes
Beyond the default answer mode, Perplexity offers two heavier modes for harder questions.
Pro Search is an agentic version of the standard flow. Instead of running one search and answering, the model is given tools and decides how to use them: a web search tool that runs targeted queries, and a URL retrieval tool that fetches and analyses the full content of a specific page rather than a snippet. The model decides which tools to use and when, building a research workflow tailored to each query. This multi-step reasoning is what lets Pro Search break a complicated question into parts, search each, and combine the findings, usually in tens of seconds.
Deep Research goes much further, and in 2026 it was rebuilt. The mode now lives inside Perplexity Computer, a cloud system that launched in late February 2026 and coordinates more than 20 frontier models in a single workflow, with a top reasoning model acting as the orchestrator and others handling specialised subtasks. Under the hood it uses a technique Perplexity calls Search as Code, where the system writes code that runs thousands of retrieval steps in parallel, tailored to each question. The output is no longer just a text report: Deep Research now produces work-ready reports, presentation decks, and dashboards. Perplexity reports the rebuild lifted its BrowseComp accuracy from 40.7% to 83.8%. A run typically takes a few minutes rather than the seconds a normal answer takes.
The modes compared
| Mode | What it does | Best for | Typical speed |
|---|---|---|---|
| Default (Sonar) | One search pass, fast cited answer | Quick facts and everyday questions | Seconds |
| Pro Search | Agentic, multi-step tool use across searches and URLs | Comparisons and multi-part questions | Tens of seconds |
| Deep Research | Many parallel retrievals, reads hundreds of sources, builds a report, deck, or dashboard | Market research, literature scans, due diligence | A few minutes |
Plans shift often, so check the current tiers before relying on them. As of mid-2026 the free tier gives basic cited answers plus a small daily allowance of Pro Searches and Deep Research runs. Pro, at $20 a month (or $200 a year), lifts those caps, unlocks the model switcher, and includes Perplexity's Comet browser, which dropped its paywall in March 2026. The $200-a-month Max tier adds far higher Computer credits for heavy Deep Research use.
Is Perplexity good? Strengths and limits
On balance, Perplexity is strong at what it was built for: current, citation-backed answers and fast source discovery. The retrieval-first design gives it a structural advantage over pure chatbots, because grounding answers in retrieved evidence reduces the room for invention and the citations give you a way to check.
The hard numbers back this up, with an important caveat. When Columbia University's Tow Center for Digital Journalism tested eight AI search engines on 200 news queries, the field got the source wrong more than 60% of the time on average. Perplexity posted the lowest failure rate at 37%, the best of the group, as reported in the Columbia Journalism Review study. Best in class still meant more than one cited claim in three was off. The study also found a counterintuitive pattern worth remembering: premium tiers answered more prompts but had *higher* error rates than free ones, because they were more willing to give a definitive but wrong answer instead of declining. Confidence is not a proxy for correctness.
That points to the limits worth naming:
- Citation quality varies. Perplexity sometimes attaches a citation to a source that says something slightly different, says it in a different context, or in rare cases does not clearly support the claim at all. The footnote tells you where to check, not that the check passed.
- It can be overconfident when results are thin. If the live web has little good material on a question, the answer can still read as assured. Sparse evidence is where errors cluster.
- It is built for synthesis, not authorship. Perplexity excels at researching and summarising. For long-form writing, planning, or open-ended creative work, a general-purpose assistant is usually the better tool.
- Source freshness is only as good as the index. Very recent events or paywalled material may be retrieved incompletely.
The practical takeaway: treat Perplexity as a research accelerator, and treat its citations as an invitation to verify rather than proof. The trade-off against traditional search is real, because traditional search hands you the raw links and asks you to do all the reading yourself.
What this means for brands
Because Perplexity answers by retrieving and citing live sources, the brands and pages it mentions are not chosen by a model guessing from training data. They are chosen because they were retrieved, ranked, and judged relevant at query time. That makes Perplexity a measurable surface: the same question can be asked repeatedly to see which sources it cites, which brands make a shortlist, and how it describes them. With Comet now free across desktop and mobile, those cited answers reach a wider audience than ever, which raises the stakes on being one of the sources that gets retrieved.
The catch is variance. Results shift with phrasing and move as the index updates, so spot-checking one query once tells you almost nothing. Tracking the same questions on a schedule is what reveals the pattern, and it is the case for monitoring answer engines systematically rather than by hand, which we make in why spot-checking fails. You can run a first check on your own brand with our free AI visibility checker.




