All Articles
    Technical
    Published June 15, 202610 min

    What Is llms.txt? The Standard for Making Your Site AI-Readable

    llms.txt is a proposed standard that hands large language models a clean, curated map of your site. Here is what the file contains, how it differs from robots.txt and sitemap.xml, what two studies of over 300,000 domains found about whether it moves AI citations, and how to ship one without overselling it.

    Matiss Katanenko

    Matiss Katanenko

    Co-founder, Honeyb

    Two research teams have now checked the same thing using server logs and citation data, and they reached the same answer. SE Ranking studied roughly 300,000 domains and found no relationship between publishing llms.txt and how often a site is cited in AI answers. Trakkr scanned 37,894 AI-cited domains and measured 6.8 average citations for sites with the file versus 6.7 for sites without it, a gap with a p-value of 0.85, which is statistics for "noise". That is the single most useful fact about llms.txt in 2026, and most guides bury it. This one leads with it, then explains what the file is, where it genuinely helps, and how to ship one without kidding yourself about what it does.

    What llms.txt is, in plain terms

    llms.txt is a proposed standard, not a ratified one. It was put forward by Jeremy Howard, co-founder of the research lab Answer.AI, in a post published on 3 September 2024. The file lives at a fixed location, `yourdomain.com/llms.txt`, and it is written in plain Markdown. Its job is to hand a language model a concise, structured overview of a site so the model does not have to parse a full page of navigation, adverts, scripts and styling to find the substance.

    The reasoning in the original proposal is worth reading because it names the gap the file is meant to fill. Howard describes the ambiguity a model faces when building context from a website: should it crawl the sitemap and ingest every page, follow external links, or for software documentation try to pull in the source code too? His answer is a single sentence that sums up the whole proposal: "Site authors know best, and can provide a list of content that an LLM should use." llms.txt is, in effect, a hand-drawn directory written for machines that read text rather than render pages.

    The format is deliberately minimal. The specification at llmstxt.org defines a fixed running order: an optional byte-order mark, then a single H1 with the site or project name (the only strictly required element), then a blockquote giving a short summary, then any number of free-text Markdown sections, and finally any number of H2 sections that each hold a list of links. Each link is a standard Markdown hyperlink followed optionally by a colon and a short note describing what sits behind it. A section titled "Optional" carries special meaning: the spec says its URLs "can be skipped if a shorter context is needed," so a model under a tight token budget drops those first.

    A minimal llms.txt example

    Here is a small, valid file that follows the specification. Note the single H1, the summary blockquote, the descriptive link sections, and the Optional block at the end that a model may skip when context is scarce.

    markdown
    # Honeyb
    
    > Honeyb is an AI visibility monitoring platform that tracks how brands
    > are mentioned, cited and ranked across AI answer engines.
    
    This file points language models to the pages most useful for
    understanding the product, its methodology and its documentation.
    
    ## Core pages
    
    - [What Honeyb does](/product): scanning, share-of-voice and sentiment tracking
    - [Methodology](/methodology): how scans are run and scored
    - [Supported engines](/engines): the answer engines monitored
    
    ## Guides
    
    - [Getting started](/docs/start): first scan and reading a report
    - [Glossary](/docs/glossary): visibility, citations and sentiment defined
    
    ## Optional
    
    - [Changelog](/changelog): release notes
    - [Brand assets](/brand): logos and usage guidance

    Some sites also publish a companion file, commonly named `llms-full.txt`, which inlines the full text of those linked pages into one large document rather than just pointing to them. This convention grew out of the proposal's own ecosystem rather than the core spec. Anthropic's documentation is the canonical example of the split: its `llms.txt` index runs to roughly 8,000 tokens, while its `llms-full.txt` packs the entire API documentation into about 481,000 tokens. The trade-off is plain. The short file is cheap to fetch and easy to keep accurate; the full file lets a model read everything in one request but is large and drifts out of date the moment a page changes.

    A llms.txt file open in a Next.js project, showing the Markdown structure with an H1 title, summary and link sections.
    A llms.txt file follows plain Markdown: one H1, a summary blockquote, then sections of described links.

    How it differs from robots.txt and sitemap.xml

    It is tempting to call llms.txt a robots.txt for AI, but the three files do genuinely different jobs and sit at very different levels of maturity. robots.txt is a permission file: it tells crawlers what they may and may not fetch, and it became a formal internet standard in 2022 as RFC 9309. sitemap.xml is a discovery file: it lists URLs so crawlers can find every page, and it is governed by the long-established protocol at sitemaps.org. llms.txt is a curation file: it selects and describes the pages that matter most, and it remains a proposal with no governing body behind it.

    Attributerobots.txtsitemap.xmlllms.txt
    Primary purposeControl crawler accessList all crawlable URLsCurate key content for LLMs
    FormatPlain text directivesXMLMarkdown
    AudienceSearch and AI crawlersSearch crawlersLanguage models
    StatusStandard (RFC 9309, 2022)Established protocolProposed, unratified
    Read by major enginesYesYesNot confirmed

    The final row is the one that matters, and it is the source of nearly all the controversy. robots.txt and sitemap.xml are read and acted upon by the major engines every day. Whether anything reads llms.txt is a separate and, so far, unresolved question.

    Do AI engines actually use it?

    Here honesty beats enthusiasm. As of mid-2026, no major AI provider has confirmed that its answer engine uses llms.txt to ground or rank responses, and the evidence that they ignore it has only got stronger. The clearest statement on record came from Google's John Mueller in a Reddit comment in April 2025, reported by Search Engine Journal: "AFAIK none of the AI services have said they're using LLMs.TXT (and you can tell when you look at your server logs that they don't even check for it)." He drew an unflattering comparison, likening the file to the old keywords meta tag that search engines abandoned because it was self-declared and easy to game: "this is what a site-owner claims their site is about ... at that point, why not just check the site directly?"

    Since then the anecdote has hardened into data, and two independent studies tell the same story:

    • SE Ranking, ~300,000 domains (November 2025). Adoption sat at 10.13%, and the study found no meaningful relationship between publishing llms.txt and how often a domain appeared in LLM answers. When the researchers dropped llms.txt from their citation-prediction model, accuracy went up, because the variable was adding noise rather than signal. Their conclusion: it "doesn't seem to directly impact AI citation frequency. At least not yet."
    • Trakkr, 37,894 AI-cited domains (March 2026). Sites with llms.txt averaged 6.8 citations against 6.7 without, statistically indistinguishable at p=0.85, with an effect size of r=-0.065, well below the threshold for even a "small" effect. Among the 50 most-cited sites on the web, only 6% published the file at all.

    Put together, the picture is consistent. The large engines already crawl and render real pages, so a separate self-authored summary is redundant from their side. And because the file is written by the site owner, it invites exaggeration, which is exactly the weakness that retired the keywords meta tag. Server-log studies, including a Semrush test that ran from August to October 2025, report little to no traffic from GPTBot, ClaudeBot or PerplexityBot requesting the file. Claims that a given consumer engine "reads" llms.txt are, as of June 2026, mostly vendor marketing rather than confirmed behaviour.

    Want to see this in action?

    Check how AI models talk about your brand — free, instant, no signup required.

    Free AI Check

    There is a real counter-case, though, and it is narrower than the hype suggests. The file demonstrably helps agentic and developer-facing tools that fetch documentation on demand. Coding assistants such as Cursor and retrieval agents can use llms.txt to pull the right pages with far less token waste than scraping rendered HTML. That is exactly why developer-heavy companies such as Anthropic, Stripe, Cloudflare and Vercel publish one for their docs, and why Trakkr found SaaS and developer-tool sites leading adoption at 24.1%, more than double the overall rate. The benefit there is concrete and present-tense. The benefit for general consumer answer engines is, for now, speculative.

    A simple way to decide what llms.txt is for

    The confusion around the file dissolves once you separate two audiences it could serve. They want different things, and llms.txt only delivers for one of them today.

    You want...Who reads itDoes llms.txt help today?
    AI coding tools to answer questions about your docs accuratelyCursor, Claude, retrieval agents, MCP integrationsYes, measurable token savings and better answers
    Your brand cited more often in ChatGPT, Perplexity or Google AI answersConsumer answer enginesNo evidence, per two studies of 300k+ domains

    If you ship documentation that developers query through AI tools, llms.txt earns its place. If your goal is to be named and cited in consumer AI answers, it is not the lever. Knowing which column you are in stops you from filing a docs convenience under "SEO" and expecting it to move rankings.

    Where llms.txt fits in a wider AI-visibility plan

    For the citation goal, the work that actually moves the needle is structural and content-led, and it sits under generative engine optimisation. Clean, well-structured pages, accurate schema markup that supports AI visibility, authoritative sourcing, and a sensible crawl policy expressed through the files engines genuinely read all matter more than a Markdown index that may go unrequested. The deeper question of how AI models choose which brands to recommend has very little to do with self-declared files and almost everything to do with what the wider web says about you.

    If your aim is to manage how AI systems reach your site rather than to be cited by them, the higher-leverage work is in your access controls. Knowing which bots fetch your pages, and deciding which to allow, is worth more than a curation file with no confirmed reader. Our reference on AI crawler user agents lists the named crawlers from the major providers and how to spot them in your logs, which is the same log analysis that fuelled the llms.txt debate in the first place.

    How to create and validate one

    Creating the file is the easy part, and you can do it by hand in a few minutes. The steps are the same regardless of platform.

    • Start with a single H1 holding your site or product name, then add a blockquote of one or two sentences summarising what the site is for.
    • Add H2 sections grouping your most important pages, for example "Documentation", "Product" and "Guides", and under each add Markdown links with a short note after a colon.
    • Move anything secondary, such as a changelog or brand assets, into a final section titled "Optional" so a token-constrained model can drop it first.
    • Keep the file lean. The point is curation, so link to the pages that matter rather than every URL on the site.
    • Save it as `llms.txt`, encode it as UTF-8 plain text, and serve it at the root of your domain so it resolves at `yourdomain.com/llms.txt`.

    Once published, validate it. A correct file is plain Markdown with no HTML, images or styling, a single H1, a summary blockquote, and link lists under H2 headings. An early audit of files in the wild already flags recurring anti-patterns: dumping every URL on the site, writing keyword-stuffed descriptions, and letting the file rot until its links 404. Treat it like documentation, not like a meta tag. Free generator and validator tools will check structure against the published spec and catch broken links, but when a tool and a blog post disagree, defer to the specification at llmstxt.org.

    Should you publish one?

    For most sites the honest answer is a measured yes, with realistic expectations. The cost is small, the file is easy to keep accurate if you keep it short, and it delivers a clear benefit for the agentic and developer tools that do read it. What you should not do is treat llms.txt as a ranking factor for consumer answer engines, because two studies covering more than 300,000 domains found no such effect. Publish it as good housekeeping, point it at your strongest pages, and put your real optimisation effort into the content and structure that engines actually consume.

    The reason the file keeps generating debate is that demand for any AI-visibility tactic has climbed steadily, and llms.txt is one of many being tested in public.

    Monthly searches (US)

    Rising demand for AI search optimisation terms

    Monthly US search volume for four AI search optimisation queries. All four trended up over the period as brands began treating AI visibility as a discipline. Source: Google Ads search volume, June 2025 to May 2026, retrieved via DataForSEO.

    The only way to know whether any of this is working, llms.txt included, is to stop guessing from a single tool or a single search and instead measure how often your brand is actually surfaced, cited and described across the engines over time. That is the difference between a hopeful tactic and a verified result, and it is the gap that turns advice like this into something you can prove for your own domain. For a structured way to do that, see our guide to AI visibility monitoring and why spot-checking fails.

    Frequently asked questions

    Is llms.txt an official standard?

    No. It is a proposed standard put forward by Jeremy Howard of Answer.AI on 3 September 2024, with its specification published at llmstxt.org. Unlike robots.txt, which became an internet standard as RFC 9309 in 2022, llms.txt has no governing body and no formal ratification.

    Do ChatGPT, Google or Perplexity read llms.txt?

    There is no confirmation that the major consumer answer engines use llms.txt to ground or rank responses, and the evidence points the other way. Google's John Mueller stated that no AI service has said it uses the file and that server logs show they do not even request it, and server-log tests report little to no traffic from GPTBot, ClaudeBot or PerplexityBot. The file does have a confirmed benefit for developer-facing tools such as coding assistants that fetch documentation on demand.

    Does publishing llms.txt increase how often AI engines cite my site?

    Two large studies say no. SE Ranking analysed roughly 300,000 domains in November 2025 and found no relationship between having llms.txt and citation frequency; removing the file from their prediction model actually improved its accuracy. Trakkr scanned 37,894 AI-cited domains in March 2026 and measured 6.8 average citations with the file versus 6.7 without, a difference that is statistically insignificant (p=0.85).

    What is the difference between llms.txt and llms-full.txt?

    llms.txt is a short index that links to your key pages with brief descriptions. A companion file, commonly named llms-full.txt, inlines the full text of those pages into one large document so a model can read everything in a single fetch. Anthropic, for example, ships an llms.txt of about 8,000 tokens alongside an llms-full.txt of around 481,000 tokens. The short file is cheaper to serve and easier to keep accurate; the full file is large and more likely to drift out of date.

    How is llms.txt different from a sitemap?

    A sitemap.xml lists every URL so crawlers can discover all your pages, and it is governed by the protocol at sitemaps.org. llms.txt does the opposite: it curates and describes only the pages that matter most, written in Markdown for language models rather than XML for search crawlers.

    How do I create and validate an llms.txt file?

    Write a Markdown file with a single H1 site name, a blockquote summary, then H2 sections containing described links to your important pages, with secondary items under an Optional section. Save it as UTF-8 plain text named llms.txt and serve it at your domain root. Free generator and validator tools can check the structure against the specification and flag broken links, but defer to llmstxt.org when sources disagree.

    Matiss Katanenko

    About the author

    Matiss Katanenko

    Co-founder, Honeyb

    My name is Matiss Katanenko and I co-founded Honeyb, the AI visibility platform that tracks how ChatGPT, Gemini, Claude, Perplexity and the other major AI engines talk about brands. I'm based in Riga, Latvia. Before Honeyb I spent years on the agency side running SEO and content programs for fast-growing brands across the US and Europe. That work is where I watched AI search start to compress the entire discovery channel into a four-brand short list, and decided to build the tool I wished agencies had. In my free time I'm in the sauna, on a padel court, or behind a drum kit.

    Connect on LinkedIn
    Honeyb

    Free, instant, no signup

    See your brand through every major AI model.

    Run a free check in 30 seconds. The picture is usually different than you'd expect.

    ChatGPTChatGPT
    ClaudeClaude
    GeminiGemini
    PerplexityPerplexity