All Articles
    AI Search
    Published June 15, 202611 min

    How Does Perplexity AI Work?

    Perplexity is an answer engine built on retrieval-augmented generation. This is how it parses a query, searches the live web, ranks sources on Vespa, and writes a cited answer, which models it routes to in 2026, and where it still falls short.

    Matiss Katanenko

    Matiss Katanenko

    Co-founder, Honeyb

    Ask Perplexity a question and it does something a chatbot never does: it runs a live web search, reads the results, and writes a direct answer with numbered footnotes back to the exact pages it used. The footnotes are not decoration. They are the product. Perplexity is built so that the language model is not supposed to assert anything it did not first retrieve, and that single constraint is what separates an answer engine from a text generator that confidently invents. Understanding the pipeline behind those footnotes explains both why Perplexity stays current and where it still gets things wrong.

    This guide walks through that pipeline one stage at a time, answers the question everyone asks first (does Perplexity use ChatGPT?), covers Pro Search and the heavily rebuilt Deep Research mode, and sets out the limits worth knowing before you rely on it. If you want the wider category first, our explainer on how AI search works covers the landscape.

    The short answer

    Perplexity works by combining a search engine with a large language model. When you ask a question it does not answer from memory the way a standalone chatbot does. It runs a real-time web search, retrieves and ranks candidate sources, then passes the most relevant passages to a language model that synthesises an answer constrained by what it retrieved. The technical name for this design is retrieval-augmented generation, or RAG. Every claim is meant to trace back to a cited source, which is why the inline footnotes are the defining feature of the interface.

    Perplexity homepage showing an answer with numbered inline citations
    Perplexity answers a query with a direct summary and numbered citations linking back to each source.

    Perplexity's own engineering philosophy puts retrieval first. As the company frames it, the goal is to solve search first, then use it to solve everything else. The model is downstream of the search system, not the other way round, and that ordering shapes every design decision that follows.

    The retrieval-augmented pipeline, step by step

    A single query moves through several discrete stages before you see an answer. The sequence below reflects how Perplexity and its infrastructure partners describe the flow.

    • 1. Query understanding. A language model parses your question to work out intent, the entities involved, and what kind of answer is needed. A factual lookup, a comparison, and an open research question each trigger different search strategies.
    • 2. Live web search. Rather than relying on a fixed training cutoff, Perplexity issues real-time searches against its web index. For complex questions it may run several searches in parallel, each targeting a different facet of the query.
    • 3. Source retrieval. Candidate documents come back as a large pool of pages, articles, and data snippets. Retrieval is hybrid: it blends lexical matching (traditional keyword scoring) with dense vector embeddings that capture semantic meaning, so a page can match on wording or on concept.
    • 4. Ranking and filtering. The candidate pool is narrowed through multiple ranking stages. Early stages use fast lexical and embedding scorers; later stages apply learned models and cross-encoders that weigh relevance, freshness, and source quality. Only the strongest passages survive.
    • 5. Answer synthesis. The top-ranked passages are assembled into a structured prompt and handed to a language model, which writes the answer using that evidence. Because the source passages carry identifiers, the model can attach citations as it writes.
    • 6. Inline citation. The finished answer is returned with numbered footnotes, each linking to the page it drew from, so you can verify any claim directly.

    Perplexity runs the retrieval and ranking layers on Vespa, a search platform built for exactly this kind of hybrid, multi-stage workload. Vespa's engineering write-up describes how the system "fuses lexical, vector, and metadata signals in a unified ranking pipeline" and supports chunk-level retrieval, "treating both documents and their internal sections as retrievable units." That last detail matters more than it sounds. By retrieving sections rather than whole pages, Perplexity can hand the model only the most relevant spans, which it says "improves factual accuracy, reduces context length, and minimises compute cost." You can read the technical detail in Vespa's account of the system. For the building blocks in general, our piece on the key technologies behind AI search covers embeddings, vector search, and rerankers.

    Does Perplexity use ChatGPT?

    This is the most common question about Perplexity, and the answer is: partly, but not exclusively. Perplexity is deliberately model-agnostic. Its architecture routes queries to a mix of language models rather than depending on any single one. As one neutral technical breakdown puts it, the system "leverages a heterogeneous mix of models, including in-house fine-tuned models from the 'Sonar' family and third-party frontier models from leading labs like OpenAI (GPT series) and Anthropic (Claude series)." The detail is set out in ByteByteGo's analysis of the architecture.

    In practice the default fast experience usually runs on Sonar, Perplexity's own model. Sonar is built on a Meta Llama base (Llama 3.3 70B) and fine-tuned for fast, factual, citation-grounded answers. It runs on Cerebras inference hardware, which is what makes the default mode feel near-instant: the partnership announcement states the setup processes roughly 1,200 tokens per second. Perplexity puts that at nearly ten times the decoding throughput of a comparable model such as Gemini 2.0 Flash. The joint announcement is in Cerebras's write-up on powering Sonar.

    Paying users can override that default and pick the model behind their answers. As of mid-2026 the Pro picker spans Sonar plus current frontier releases including GPT-5.x, Claude Opus 4.5, and Gemini 3 Pro, and a Model Council mode that runs the same query across several frontier models at once and reconciles their answers. The roster turns over quickly, so treat any specific list as a snapshot rather than a fixed menu.

    So Perplexity can use the same underlying models as ChatGPT, but it is not ChatGPT. The product is the orchestration layer around those models: the search, retrieval, ranking, and citation system. That is the part that does not change when you swap the model.

    Why the model-agnostic approach matters

    Treating the language model as a swappable component has practical consequences. It lets Perplexity match the model to the task: a quick fact can go to a fast, cheap model while a dense analytical question can go to a frontier reasoning model. It avoids dependence on any one provider. And it means answer quality is governed as much by retrieval and ranking quality as by the raw model. A weaker model with excellent, fresh sources will often beat a stronger model working from stale or thin evidence. This is the same dynamic that decides which brands surface in answers, which we unpack in how AI models choose which brands to recommend.

    Monthly searches (US)

    Search demand for "perplexity ai"

    Monthly US search volume for the query "perplexity ai". Demand for the answer engine rose through the period, a proxy for how fast consumers are adopting it. Source: Google Ads search volume, June 2025 to May 2026, retrieved via DataForSEO.

    Want to see this in action?

    Check how AI models talk about your brand — free, instant, no signup required.

    Free AI Check

    Demand for Perplexity has grown alongside the broader shift towards AI search. The chart above tracks US search interest in the brand, and it sits within the wider category move from links to answers that the rest of this guide unpacks.

    Pro Search and Deep Research modes

    Beyond the default answer mode, Perplexity offers two heavier modes for harder questions.

    Pro Search is an agentic version of the standard flow. Instead of running one search and answering, the model is given tools and decides how to use them: a web search tool that runs targeted queries, and a URL retrieval tool that fetches and analyses the full content of a specific page rather than a snippet. The model decides which tools to use and when, building a research workflow tailored to each query. This multi-step reasoning is what lets Pro Search break a complicated question into parts, search each, and combine the findings, usually in tens of seconds.

    Deep Research goes much further, and in 2026 it was rebuilt. The mode now lives inside Perplexity Computer, a cloud system that launched in late February 2026 and coordinates more than 20 frontier models in a single workflow, with a top reasoning model acting as the orchestrator and others handling specialised subtasks. Under the hood it uses a technique Perplexity calls Search as Code, where the system writes code that runs thousands of retrieval steps in parallel, tailored to each question. The output is no longer just a text report: Deep Research now produces work-ready reports, presentation decks, and dashboards. Perplexity reports the rebuild lifted its BrowseComp accuracy from 40.7% to 83.8%. A run typically takes a few minutes rather than the seconds a normal answer takes.

    The modes compared

    ModeWhat it doesBest forTypical speed
    Default (Sonar)One search pass, fast cited answerQuick facts and everyday questionsSeconds
    Pro SearchAgentic, multi-step tool use across searches and URLsComparisons and multi-part questionsTens of seconds
    Deep ResearchMany parallel retrievals, reads hundreds of sources, builds a report, deck, or dashboardMarket research, literature scans, due diligenceA few minutes

    Plans shift often, so check the current tiers before relying on them. As of mid-2026 the free tier gives basic cited answers plus a small daily allowance of Pro Searches and Deep Research runs. Pro, at $20 a month (or $200 a year), lifts those caps, unlocks the model switcher, and includes Perplexity's Comet browser, which dropped its paywall in March 2026. The $200-a-month Max tier adds far higher Computer credits for heavy Deep Research use.

    Is Perplexity good? Strengths and limits

    On balance, Perplexity is strong at what it was built for: current, citation-backed answers and fast source discovery. The retrieval-first design gives it a structural advantage over pure chatbots, because grounding answers in retrieved evidence reduces the room for invention and the citations give you a way to check.

    The hard numbers back this up, with an important caveat. When Columbia University's Tow Center for Digital Journalism tested eight AI search engines on 200 news queries, the field got the source wrong more than 60% of the time on average. Perplexity posted the lowest failure rate at 37%, the best of the group, as reported in the Columbia Journalism Review study. Best in class still meant more than one cited claim in three was off. The study also found a counterintuitive pattern worth remembering: premium tiers answered more prompts but had *higher* error rates than free ones, because they were more willing to give a definitive but wrong answer instead of declining. Confidence is not a proxy for correctness.

    That points to the limits worth naming:

    • Citation quality varies. Perplexity sometimes attaches a citation to a source that says something slightly different, says it in a different context, or in rare cases does not clearly support the claim at all. The footnote tells you where to check, not that the check passed.
    • It can be overconfident when results are thin. If the live web has little good material on a question, the answer can still read as assured. Sparse evidence is where errors cluster.
    • It is built for synthesis, not authorship. Perplexity excels at researching and summarising. For long-form writing, planning, or open-ended creative work, a general-purpose assistant is usually the better tool.
    • Source freshness is only as good as the index. Very recent events or paywalled material may be retrieved incompletely.

    The practical takeaway: treat Perplexity as a research accelerator, and treat its citations as an invitation to verify rather than proof. The trade-off against traditional search is real, because traditional search hands you the raw links and asks you to do all the reading yourself.

    What this means for brands

    Because Perplexity answers by retrieving and citing live sources, the brands and pages it mentions are not chosen by a model guessing from training data. They are chosen because they were retrieved, ranked, and judged relevant at query time. That makes Perplexity a measurable surface: the same question can be asked repeatedly to see which sources it cites, which brands make a shortlist, and how it describes them. With Comet now free across desktop and mobile, those cited answers reach a wider audience than ever, which raises the stakes on being one of the sources that gets retrieved.

    The catch is variance. Results shift with phrasing and move as the index updates, so spot-checking one query once tells you almost nothing. Tracking the same questions on a schedule is what reveals the pattern, and it is the case for monitoring answer engines systematically rather than by hand, which we make in why spot-checking fails. You can run a first check on your own brand with our free AI visibility checker.

    Frequently asked questions

    Does Perplexity use ChatGPT?

    Partly. Perplexity is model-agnostic, meaning it routes questions to different language models. Its default fast mode usually runs on Sonar, its own model built on a Meta Llama 3.3 70B base. Paying users can switch the model behind their answers to a frontier option such as a recent GPT, Claude Opus, or Gemini release. So Perplexity can use the same models as ChatGPT, but the product itself is the search, retrieval, and citation layer around them, not the model.

    How is Perplexity different from a normal chatbot?

    A standalone chatbot answers from what it learned during training. Perplexity searches the live web for each question, retrieves and ranks current sources, then writes an answer constrained by that retrieved evidence and attaches citations. That retrieval-augmented design keeps it current beyond a training cutoff and gives you a way to verify claims.

    What is the difference between Pro Search and Deep Research?

    Pro Search is an agentic version of the normal answer flow: the model uses tools to run multiple targeted searches and fetch full page content, then combines the findings, usually in tens of seconds. Deep Research goes much further. As of 2026 it runs inside Perplexity Computer, coordinating more than 20 models to run thousands of retrieval steps in parallel and produce a structured report, presentation deck, or dashboard, which takes a few minutes.

    Is Perplexity accurate?

    It is the most accurate of the major answer engines, but not flawless. In Columbia University's Tow Center study, AI search engines got news sources wrong more than 60% of the time on average, while Perplexity posted the lowest failure rate at 37%. Its retrieval-first design grounds answers in cited sources, which helps, but citations are not a guarantee: Perplexity sometimes cites a source that does not fully support the claim, and it can sound confident when web results are thin. Use the citations to verify rather than as proof.

    How much does Perplexity cost?

    There is a free tier that gives basic cited answers plus a small daily allowance of Pro Searches and Deep Research runs. Pro costs $20 a month (or $200 a year), lifting those caps, unlocking the model switcher, and including the Comet browser, which became free in March 2026. The Max tier, at $200 a month, adds much higher Computer credits for heavy Deep Research use. Plans change often, so check the current tiers before relying on them.

    How does Perplexity choose which sources to cite?

    After running a live web search, it retrieves a large pool of candidate pages using hybrid retrieval that blends keyword matching with semantic vector search, run on the Vespa search platform. That pool is narrowed through several ranking stages, including learned models and cross-encoders, that weigh relevance, freshness, and source quality. The top-ranked passages are passed to the language model, which writes the answer and cites the sources it drew from.

    Matiss Katanenko

    About the author

    Matiss Katanenko

    Co-founder, Honeyb

    My name is Matiss Katanenko and I co-founded Honeyb, the AI visibility platform that tracks how ChatGPT, Gemini, Claude, Perplexity and the other major AI engines talk about brands. I'm based in Riga, Latvia. Before Honeyb I spent years on the agency side running SEO and content programs for fast-growing brands across the US and Europe. That work is where I watched AI search start to compress the entire discovery channel into a four-brand short list, and decided to build the tool I wished agencies had. In my free time I'm in the sauna, on a padel court, or behind a drum kit.

    Connect on LinkedIn
    Honeyb

    Free, instant, no signup

    See your brand through every major AI model.

    Run a free check in 30 seconds. The picture is usually different than you'd expect.

    ChatGPTChatGPT
    ClaudeClaude
    GeminiGemini
    PerplexityPerplexity