Ask most large language models what changed in a fast-moving topic last week and you get a confident answer assembled from training data that is months stale. The model has no idea it is wrong. The Model Context Protocol, almost always shortened to MCP, is the standard that lets it find out: a clean, uniform way for an assistant to reach out to live tools and data while it reasons. Perplexity ships an official MCP server that wires its web-grounded search straight into clients like Claude, Cursor and VS Code, so a coding assistant can check the actual release notes instead of guessing. This guide covers what MCP is, exactly what Perplexity's server and the underlying Sonar API do, which of the four tools to reach for, what each one costs as of June 2026, and how to wire it up.
If you are new to how Perplexity assembles answers in the first place, how Perplexity AI works pairs well with this piece. The MCP server is essentially a programmatic doorway into that same search-and-citation engine, exposed so other tools can call it.
What the Model Context Protocol is
MCP is an open standard that Anthropic introduced on 25 November 2024, created by engineers David Soria Parra and Justin Spahr-Summers. Its job is to standardise how AI applications connect to external systems: data sources, tools and workflows. Before MCP, every link between a model and an outside service was a bespoke connector. Ten assistants and ten data sources meant up to a hundred custom integrations, each one its own brittle plumbing to build and maintain.
Anthropic's documentation describes MCP as being like a USB-C port for AI applications. Just as USB-C gives devices one standard socket, MCP gives AI applications one standard way to reach external systems. You build a server once against the protocol, and any MCP-aware client can use it. The pitch is a universal, open standard for connecting AI systems with data sources, collapsing the integration matrix into a single contract.
That pitch has held up. MCP stopped being an Anthropic project within months: OpenAI adopted it in March 2025 across the ChatGPT desktop app, the Agents SDK and the Responses API, and Google DeepMind followed in April 2025 with support in Gemini. By mid-2026 it is the de facto wiring for connecting agents to tools, which is why a single Perplexity server now works across clients from rival labs without anyone writing glue code.
The architecture is deliberately small. An MCP host is the application a person uses, say a chat client or a coding editor. Inside that host, an MCP client holds a connection to one or more MCP servers. Each server exposes capabilities through three primitives: tools the model can call, resources it can read, and prompts it can reuse. Perplexity's server is one such server, and the capabilities it exposes are all search-shaped.

What the Perplexity MCP server does
The official Perplexity MCP server ships as the npm package @perplexity-ai/mcp-server under the permissive MIT licence, with source in the perplexityai/modelcontextprotocol repository on GitHub. It is a bridge between an MCP client and Perplexity's Sonar models and Search API. When your assistant decides a question needs current information, it calls one of the server's tools, the server queries Perplexity, and the grounded result, source URLs and all, flows back into the conversation as if the model had known it all along.
The server exposes four tools, each mapped to a specific Sonar capability. They differ sharply in how much work they do, how long they take and what they cost, so the choice of tool is the choice that matters most in practice.
| Tool | What it does | Model behind it |
|---|---|---|
| perplexity_search | Direct web search returning ranked results with titles, URLs and snippets | Perplexity Search API |
| perplexity_ask | Conversational answer with real-time web search | sonar-pro |
| perplexity_research | Deep, multi-step research with thorough analysis and citations | sonar-deep-research |
| perplexity_reason | Chain-of-thought reasoning over fetched information | sonar-reasoning-pro |
A useful way to hold the four in your head is by latency and cost, not by name. perplexity_search is sub-second and cheap: ranked links and snippets you feed into your own logic, no synthesis. perplexity_ask is the everyday workhorse, a grounded natural-language answer in a couple of seconds. perplexity_reason adds explicit chain-of-thought for questions where the answer is an argument rather than a fact, so it emits more tokens and takes longer. perplexity_research is the heavyweight: it fires off many searches, reads widely and synthesises a report, which can take minutes and costs an order of magnitude more. The single most common mistake is letting a chatty assistant default to perplexity_research when perplexity_ask would have answered in a fraction of the time and cost.
The Sonar API underneath
MCP is the transport; Sonar is the engine. The Sonar API is Perplexity's developer platform for web-grounded responses, and the MCP server is a convenience wrapper around it. You can call Sonar directly without MCP at all, which is the right move when you are building your own application rather than augmenting an existing assistant.
The API is OpenAI-compatible. You point an OpenAI client library at the base URL https://api.perplexity.ai, authenticate with a bearer token, and use the familiar chat-completions request shape, as Perplexity's own OpenAI-compatibility docs lay out. The defining difference from a plain LLM call is that every answer comes back with citations: a list of the web pages that informed the response, returned in a structured field so you can verify and render sources rather than trust an ungrounded generation.
Sonar is a family, and the MCP tools map onto it. A base sonar model handles quick factual lookups. sonar-pro handles complex multi-source queries. sonar-reasoning and sonar-reasoning-pro add chain-of-thought for analytical work. sonar-deep-research runs exhaustive multi-step searches for report-style output. A minimal direct call looks like this:
curl https://api.perplexity.ai/chat/completions \
-H "Authorization: Bearer $PERPLEXITY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "sonar-pro",
"messages": [
{ "role": "user", "content": "What changed in the EU AI Act this quarter?" }
]
}'What it actually costs
Sonar pricing has two parts, and the second one trips people up. There is a per-token charge, and on top of it a per-request fee that scales with how much web content the query pulls in. Quoting only the token rate understates the bill. The figures below are Perplexity's published Sonar API pricing as of June 2026.
Want to see this in action?
Check how AI models talk about your brand — free, instant, no signup required.
| Model | Input / 1M tokens | Output / 1M tokens | Request fee / 1K requests |
|---|---|---|---|
| sonar | $1 | $1 | $5 to $12 |
| sonar-pro | $3 | $15 | $6 to $14 |
| sonar-reasoning-pro | $2 | $8 | $6 to $14 |
| sonar-deep-research | $2 | $8 | separate search, reasoning and citation fees |
The request fee varies by how much search context the query consumes, low to high. Deep research is priced differently again: alongside its token rates it bills search queries at roughly $5 per thousand and adds reasoning and citation token charges, because a single deep-research call can run dozens of underlying searches. One run can cost more than hundreds of perplexity_ask calls, which is the real argument for matching the tool to the task rather than the prices in isolation.
Use cases that actually fit
MCP earns its keep when an assistant needs information it could not have memorised. A few patterns recur.
- Coding assistants that need current library behaviour. A developer in Cursor or VS Code asks about a breaking change in a dependency, and perplexity_ask returns the answer with links to the release notes instead of a hallucinated API from training data.
- Research and drafting workflows. perplexity_research compiles a sourced overview so a writer starts from cited material rather than a blank page, then trims rather than fabricates.
- Competitive and market questions. perplexity_search returns ranked sources on a competitor or a market, which the assistant summarises and attributes.
- In-chat fact-checking. When an answer hinges on a recent event, perplexity_ask grounds it and surfaces the citations so the user can check the claim against the source.
For teams that track how they appear across AI answer engines, the same plumbing has a second life. You can script Sonar queries to capture how Perplexity describes a brand or which sources it cites, then watch that move over time. That is adjacent to what a dedicated AI visibility monitoring platform does at scale, and it has the same trap: a handful of manual prompts is anecdote, not signal. Our free AI visibility checker gives a quick read without writing any code, and perplexity vs ChatGPT brand ranking shows why a brand can rank well on one engine and vanish on another.
Setting it up
The fastest path is to add the server to whichever MCP client you already use. The shape is the same across Claude Desktop, Cursor, VS Code, Windsurf and the rest: you declare a server, the command that launches it, and the environment variable holding your key. First, create an API key from your Perplexity account's API portal.
The server supports two transports. The default is stdio, where the client launches the server as a local subprocess and they talk over standard input and output. There is also an HTTP server mode, listening on port 8080 by default and configurable through a PORT variable, for shared or cloud deployments where several clients hit one running instance. For a single developer on one machine, stdio is the simpler choice and the one the snippet below uses.
{
"mcpServers": {
"perplexity": {
"command": "npx",
"args": ["-y", "@perplexity-ai/mcp-server"],
"env": {
"PERPLEXITY_API_KEY": "your_key_here"
}
}
}
}The `npx -y` invocation fetches and runs the latest published package, so there is no global install to maintain. If you use Claude Code, the same server is added from the command line with `claude mcp add perplexity --env PERPLEXITY_API_KEY="your_key_here" -- npx -y @perplexity-ai/mcp-server`. Once the client restarts and connects, the four tools appear and the assistant calls them when a question warrants live search.
Configuration options worth knowing
Beyond the required key, the server reads a handful of optional environment variables that start to matter once you move past a first test. They cover timeouts, routing and logging, which are the three things that tend to bite in real deployments.
| Variable | Effect |
|---|---|
| PERPLEXITY_API_KEY | Required. Authenticates every request to the Sonar and Search APIs. |
| PERPLEXITY_TIMEOUT_MS | Request timeout, defaulting to five minutes. Raise it only if deep research times out. |
| PERPLEXITY_BASE_URL | Overrides the API endpoint, defaulting to https://api.perplexity.ai. Useful behind a gateway. |
| PERPLEXITY_PROXY | Routes traffic through a network proxy, common in corporate environments. |
| PERPLEXITY_LOG_LEVEL | Sets verbosity to DEBUG, INFO, WARN or ERROR when diagnosing issues. |
Leave the timeout alone for perplexity_ask and perplexity_search, and raise it only if you lean on perplexity_research, which legitimately takes minutes because of all those underlying searches. Turning the log level up to DEBUG is the first move when a tool call fails silently, since it surfaces the exact request and response rather than swallowing the error.
Where this fits in your stack
MCP and Sonar solve different layers of the same problem. If you are extending an assistant you already use, the MCP server is the least-effort route: install, add a key, and the model gains grounded search. If you are building a product, calling the Sonar API directly gives you control over model selection, prompt construction and how citations are rendered, plus the option to budget tool-by-tool rather than hoping the assistant picks well. Plenty of teams do both: MCP for interactive work, the raw API for automated pipelines.
Either way, the value is the thing that makes Perplexity useful as a product in the first place: answers tied to sources you can check. For the broader shift toward answer engines that cite their work, what AI search is sets the context. The MCP server simply makes that capability callable from inside the tools you already work in.




