Hire us

Celebrating 13 years : 2013 - 2026

Does llms.txt actually matter? Here’s what’s hype, what’s useful, and what to fix first.

April 23rd 2026

By Emily

There’s a certain kind of SEO panic that appears every time a new file gets a name and a bit of momentum.

First it was robots.txt. Then sitemap.xml. Now it’s llms.txt.

And fair enough. If you’re watching search splinter into Google, AI Overviews, ChatGPT, Gemini, Claude and whatever comes next, it’s tempting to think one neat little text file might help you keep up.

It might help a bit. It’s not the thing to fix first.

First: what even is llms.txt?

llms.txt is a proposed standard, published by Jeremy Howard on llmstxt.org in September 2024. The idea is simple: place a markdown file at /llms.txt on your site that gives language models a clean summary of what your site is, what matters, and where to find the useful stuff. It’s meant to help models use website information at inference time, not act as a crawler control file. That distinction matters.

robots.txt tells crawlers what they can access. A sitemap helps discovery. Structured data helps machines understand what a page is about. llms.txt is closer to a curated guidebook. Helpful in theory. Not a magic visibility switch.

So, does it matter?

Yes, but not in the way some people are selling it.

Right now, llms.txt matters more as an emerging convention than a proven ranking or citation lever. The official proposal exists, and some platforms and plugins support generating one, but the more important point is this: the major AI platforms publicly document crawler access, indexing controls, and content discoverability through things like robots.txt, specific user agents, noindex, crawlability, and structured data. In the documentation, those are the mechanisms the big players explicitly describe today.

That means the hype gets ahead of the evidence when people talk about llms.txt like it’s the new robots.txt.

It isn’t.

If your site is hard to crawl, badly structured, thin on proof, or accidentally blocking AI search crawlers, adding llms.txt is like polishing the brass while the engine falls out.

What the hype gets wrong

A lot of the noise around llms.txt comes from people treating it as if it does one of three things:

  1. gets you indexed by AI tools
  2. boosts brand mentions in LLMs
  3. replaces technical SEO

None of those claims are supported by the official docs we’ve got.

OpenAI’s publisher guidance says that if you want your content included in ChatGPT summaries and snippets, you should make sure you’re not blocking OAI-SearchBot. It also says publishers can track referral traffic from ChatGPT via utm_source=chatgpt.com, and that noindex is the correct way to stop pages appearing, not some speculative AI-only shortcut.

Google’s documentation points publishers to Google-Extended if they want to control whether content can be used for Gemini training and grounding, while keeping ordinary Google Search crawling separate. Again, that’s an established control inside robots.txt, not llms.txt.

So if someone tells you llms.txt is the file that will unlock AI visibility, they’re skipping over the bits the platforms themselves are actually documenting.

What’s genuinely useful about it

Now for the less shouty answer: llms.txt can still be useful.

It gives you a forced moment of clarity. To create a good one, you have to decide:

  • what your site’s actually about
  • which pages matter most
  • which documentation, services, products or articles deserve to be surfaced
  • how to describe them cleanly, without navigation sludge and CMS clutter

That exercise has value even if no model ever reads the file.

In other words, llms.txt can be useful because it pushes you towards clearer information architecture and cleaner summaries. Those are good things anyway. And they align with what search engines and AI systems already reward: accessible content, crawlable links, understandable page structure, and machine-readable context.

What to fix first

Before you spend time generating llms.txt, fix the boring stuff that still does the heavy lifting.

1. Make sure the right crawlers can access the right content

This is the obvious one, and yet people still get it wrong.

If you want visibility in ChatGPT search experiences, don’t block OAI-SearchBot. If you want to opt out of OpenAI training, that’s a different control: GPTBot. Google also separates ordinary search crawling from Google-Extended, which manages use for Gemini training and some grounding contexts.

That separation matters. “Allow search, block training” is a real technical decision now.

2. Fix crawlability and internal linking

Google is still very clear on this: if content isn’t reachable through crawlable links, discovery gets harder. Clean linking, sensible architecture, and accessible navigation help crawlers find and understand your pages.

If your best case study is buried three clicks deep behind a JS-heavy carousel with no proper internal links, llms.txt won’t rescue it.

3. Use structured data properly

Google explicitly says it uses structured data it finds on the web to understand page content and broader entities. For articles, products, organisations and other content types, that machine-readable context still matters.

If your site has weak schema, missing article markup, poor entity signals, or inconsistent business information, fix that before you chase speculative wins.

4. Sort out indexation signals

Google’s own docs are blunt: robots.txt is not a mechanism for keeping a page out of Google. Use noindex or access controls when that’s your goal. OpenAI’s publisher docs make a similar point for ChatGPT search behaviour.

That means your indexation logic needs to be intentional. No accidental blocking. No contradictory signals. No mystery pages floating around because nobody checked the basics.

5. Publish content that’s actually worth citing

This is where the conversation gets less technical and more useful.

AI systems and search engines both need something solid to work with: clear claims, original thinking, up-to-date information, named expertise, and pages that answer real questions cleanly. Google’s structured data and crawling docs don’t replace quality, and neither does llms.txt. They just help machines process what’s already there.

If your content says the same thing as every other industry blog on the internet, no formatting trick is going to make it memorable.

Where we land on it

llms.txt is worth treating as a nice-to-have, especially for documentation-heavy sites, developer products, knowledge bases, or brands with a lot of useful long-form content. It’s lightweight, low-risk, and it may become more widely supported over time because the format is simple and sensible.

But for most brands, it belongs behind a longer queue of more important work:

  • unblock the right AI crawlers
  • tighten crawlability
  • improve internal linking
  • add structured data
  • clean up indexation
  • publish sharper, more cite-worthy content

Do that, and llms.txt becomes a tidy extra.

Skip that, and llms.txt becomes theatre.

The practical answer

Should you add one?

Probably, yes — if the basics are already in decent shape.

Should you expect it to transform your visibility in AI search on its own?

No. Not based on the platform documentation and evidence we’ve got today.

That’s the real takeaway. Don’t ignore it. Don’t worship it either.

Fix the foundations first.

Sources

Related Projects

Interested in working with KOTA?

Drop us a line at
hello@kota.co.uk

We are a Creative Digital Agency based in Clerkenwell London, specialising in Creative Web Design, Web Development, Branding and Digital Marketing.