/llms.txt) that tells large language models like ChatGPT, Claude and Perplexity which URLs on your site are worth reading. This generator crawls your sitemap, groups URLs into clean sections and gives you a copy-paste-ready file in under 10 seconds. No signup.What is llms.txt?
llms.txt is a plain-text manifest, written in markdown, that lives at the root of your domain. Its job is to give large language models a curated tour of your site: a title, a short description of the project, and then a series of H2 headings with bullet-pointed links beneath each one. The format is small enough to fit on a postcard but expressive enough to capture the structure of even a 10,000-page site.
The file follows a v0.1 specification published by the llmstxt.org working group in 2024. Adoption has been brisk: Anthropic ships llms.txt files for the Claude API docs, Mintlify auto-generates them for every site it hosts, and Cursor reads them when indexing a codebase. The pattern matters because LLMs reason better from a curated reading list than from a 50,000-URL sitemap stuffed with archive pages, tag indexes and pagination.
The minimum valid file
The smallest valid llms.txt is just a project title preceded by an H1. The most useful version - which is what this generator produces - includes a description blockquote and several H2-headed sections, each containing markdown links with optional descriptions. That structure mirrors how an LLM looks at a website during retrieval: hierarchically, by topic, with the title carrying the most weight.
Why your site needs llms.txt in 2026
AI assistants are no longer a novelty traffic source. Roughly 200 million people use ChatGPT every week, Perplexity routes tens of millions of buying-intent queries each month, and Google's AI Overviews appear above the organic results for an estimated 40% of informational searches. When a user asks an AI "the best tool for X", the answer comes from a mixture of training data and live retrieval. If your site is poorly structured or your sitemap is bloated, the model will struggle to find and cite your most relevant content.
A handcrafted llms.txt changes the calculus. You decide which pages matter, in what order, and how they are grouped. AI engines that honour the file get the curated version. Even those that do not - yet - benefit indirectly: many third-party indexers and retrieval-augmented systems treat llms.txt as a high-priority hint when building their own crawl frontier.
Most teams discover their sitemap was the problem only after they sit down to write llms.txt. The exercise of choosing 200 URLs that actually represent the business is the highest-leverage information architecture audit you'll do all year.
How this llms.txt generator works
The generator does the boring part - fetching your sitemap, normalising URLs, grouping them by intent. You do the high-leverage part - editing, reordering, sharpening titles. End to end, most teams finish in 20 minutes.
- You enter a root domain. Either
example.comorhttps://example.comworks. We strip paths and query strings. - We fetch /sitemap.xml and /sitemap_index.xml using our SSRF-hardened crawler. If your sitemap is an index pointing to nested sitemaps, we follow up to eight levels deep.
- Each URL is parsed and categorised by path-prefix into one of 12 buckets: Home, Product, Pricing, Tools, Use Cases, Integrations, Comparisons, Documentation, Blog, Company, Contact, Other.
- We render markdown with up to 500 URLs total and at most 50 per section, prioritised in the order most LLMs surface (Home, Product, Pricing first; Other last).
- You copy or download the output as a plain text file and upload it to your web root.
The llms.txt spec, explained
The format is small. Here is the full surface area you actually need to know.
1. H1 title (required)
The first non-empty line is an H1 with the project name. This becomes the LLM's primary anchor when it cites you.
2. Description blockquote (optional but recommended)
A markdown blockquote (> one or two sentences) that explains what the site is. Treat this as your AI elevator pitch. Many engines surface this verbatim.
3. H2-grouped link sections
Each H2 is a topic. Beneath the H2, list bullet-pointed markdown links. You can optionally add a colon-separated description after each link. Order matters - LLMs treat earlier sections as higher priority.
4. Optional extras
The spec leaves room for llms-full.txt (a longer file with full text content for offline indexing) and language tags. The generator emits a v0.1-compliant llms.txt; the longer variant is on our roadmap.
Where to host the file
The file must live at the root of your domain so it is reachable at https://yourdomain.com/llms.txt. The HTTP response must be 200 OK and the content-type should be text/plain or text/markdown. CDNs need to let the file through unmodified - watch out for HTML-only edge rewrites that intercept everything except /api/*.
Hosting cheat-sheet by stack
| Stack | Where to put the file | Notes |
|---|---|---|
| Next.js | public/llms.txt | Served as-is. Works on Vercel, Netlify, DO App Platform. |
| WordPress | Web root, alongside wp-config.php | Disable any plugin that rewrites unknown paths. |
| Webflow / Framer | Custom code section or asset upload | Both platforms now allow root-level static files. |
| Shopify | Theme assets + redirect rule | Use a 200 redirect to a hosted text file in /files. |
| Static site (Hugo, Jekyll, Astro) | static/llms.txt or public/llms.txt | No special config required. |
| Cloudflare Pages | Build output root | Confirm it is not gzipped to a non-text content-type. |
How to validate your llms.txt
- Visit
https://yourdomain.com/llms.txtin a browser. You should see plain markdown, not a 404 or a HTML page. - Check the response headers in DevTools.
content-typeshould start withtext/. - Confirm there is no
x-frame-options,content-disposition: attachmentor other header that breaks programmatic fetching. - Run the file through a markdown linter (any will do). Broken syntax means LLMs will mis-parse the sections.
- Use our AI Crawler Checker against
/llms.txtto confirm GPTBot, ClaudeBot and PerplexityBot are not blocked from fetching it.
Tips for a high-signal llms.txt
The default generator output is a strong starting point. The teams that get the most lift from llms.txt do these five things on top.
- Lead with money pages. Pricing, the product home, your top comparison page. The first 10 URLs in the file carry the most weight.
- Cap each section at 10-20 URLs. AI engines reward density over breadth. A 50-URL Blog section dilutes signal; pick the 10 best posts.
- Cut archive and tag pages. Anything that is a list of other links rarely belongs in llms.txt. Spotlight the destinations, not the indexes.
- Add one-line descriptions to your top 20 URLs. Format:
- [Title](url): one-sentence description. This dramatically improves citation quality. - Re-run after every launch. A new pricing page, a new use-case page, a new comparison - all should be added to llms.txt the same day they ship.
llms.txt vs robots.txt vs sitemap.xml
These three files solve different problems. Most professional sites need all three.
| File | Purpose | Audience | Format |
|---|---|---|---|
robots.txt | Exclusion - which paths crawlers must not visit | All crawlers (search + AI) | Plain text directives |
sitemap.xml | Discovery - the full list of URLs you want indexed | Search engines primarily | XML, machine-generated |
llms.txt | Curation - the URLs AI should focus on, with structure | LLMs and AI answer engines | Markdown, human-edited |
Common mistakes
- Treating llms.txt like a sitemap. Dumping every URL defeats the point. Curate ruthlessly.
- Wrong content-type. A file served as
application/octet-streammay be downloaded by browsers but skipped by some indexers. - Forgetting the H1. Without an H1 title, parsers treat the file as malformed.
- Hosting at
/llms-txtor/llms. The path is/llms.txtexactly. Aliases do not count. - Blocking GPTBot or ClaudeBot. If your robots.txt blocks the AI crawlers, llms.txt may never be fetched. Check with the AI Crawler Checker.
- Letting the file go stale. If your llms.txt still reads "Beta launch coming Q2 2025" in mid-2026, it actively damages trust.
Use cases by team type
- SaaS marketing teams use it to spotlight pricing, integrations and use-case pages so AI engines recommend them in buying-intent queries.
- Documentation teams use it to give LLM-powered IDEs (Cursor, Continue) a clean entry point into their docs, improving developer experience.
- Ecommerce sites use it to highlight category and bestseller pages over the long tail of out-of-stock variants.
- Agencies generate llms.txt for every client as part of an AI visibility audit. Pair it with our AI Readiness Audit for a polished deliverable.
- Local businesses spotlight service pages and city-specific landing pages so local AI queries surface them.
Glossary
| Term | Meaning |
|---|---|
| llms.txt | Plain-text manifest at /llms.txt that curates URLs for LLM consumption. |
| llms-full.txt | Optional companion file containing full-text content for offline indexing. |
| GEO | Generative Engine Optimization - the discipline of structuring content so AI engines surface it. |
| Retrieval-augmented generation (RAG) | Architecture where an LLM fetches live documents to ground its answer. |
| Sitemap index | An XML file that points to multiple smaller sitemaps. Common on sites with thousands of URLs. |