Free llms.txt Generator

Quick answer

llms.txt is a plain-text markdown file at the root of your domain (/llms.txt) that tells large language models like ChatGPT, Claude and Perplexity which URLs on your site are worth reading. This generator crawls your sitemap, groups URLs into clean sections and gives you a copy-paste-ready file in under 10 seconds. No signup.

What is llms.txt?

llms.txt is a plain-text manifest, written in markdown, that lives at the root of your domain. Its job is to give large language models a curated tour of your site: a title, a short description of the project, and then a series of H2 headings with bullet-pointed links beneath each one. The format is small enough to fit on a postcard but expressive enough to capture the structure of even a 10,000-page site.

The file follows a v0.1 specification published by the llmstxt.org working group in 2024. Adoption has been brisk: Anthropic ships llms.txt files for the Claude API docs, Mintlify auto-generates them for every site it hosts, and Cursor reads them when indexing a codebase. The pattern matters because LLMs reason better from a curated reading list than from a 50,000-URL sitemap stuffed with archive pages, tag indexes and pagination.

The minimum valid file

The smallest valid llms.txt is just a project title preceded by an H1. The most useful version - which is what this generator produces - includes a description blockquote and several H2-headed sections, each containing markdown links with optional descriptions. That structure mirrors how an LLM looks at a website during retrieval: hierarchically, by topic, with the title carrying the most weight.

Why your site needs llms.txt in 2026

AI assistants are no longer a novelty traffic source. Roughly 200 million people use ChatGPT every week, Perplexity routes tens of millions of buying-intent queries each month, and Google's AI Overviews appear above the organic results for an estimated 40% of informational searches. When a user asks an AI "the best tool for X", the answer comes from a mixture of training data and live retrieval. If your site is poorly structured or your sitemap is bloated, the model will struggle to find and cite your most relevant content.

A handcrafted llms.txt changes the calculus. You decide which pages matter, in what order, and how they are grouped. AI engines that honour the file get the curated version. Even those that do not - yet - benefit indirectly: many third-party indexers and retrieval-augmented systems treat llms.txt as a high-priority hint when building their own crawl frontier.

“

Most teams discover their sitemap was the problem only after they sit down to write llms.txt. The exercise of choosing 200 URLs that actually represent the business is the highest-leverage information architecture audit you'll do all year.

Nik Sov· Founder, Livesov

How this llms.txt generator works

The generator does the boring part - fetching your sitemap, normalising URLs, grouping them by intent. You do the high-leverage part - editing, reordering, sharpening titles. End to end, most teams finish in 20 minutes.

You enter a root domain. Either example.com or https://example.com works. We strip paths and query strings.
We fetch /sitemap.xml and /sitemap_index.xml using our SSRF-hardened crawler. If your sitemap is an index pointing to nested sitemaps, we follow up to eight levels deep.
Each URL is parsed and categorised by path-prefix into one of 12 buckets: Home, Product, Pricing, Tools, Use Cases, Integrations, Comparisons, Documentation, Blog, Company, Contact, Other.
We render markdown with up to 500 URLs total and at most 50 per section, prioritised in the order most LLMs surface (Home, Product, Pricing first; Other last).
You copy or download the output as a plain text file and upload it to your web root.

The llms.txt spec, explained

The format is small. Here is the full surface area you actually need to know.

1. H1 title (required)

The first non-empty line is an H1 with the project name. This becomes the LLM's primary anchor when it cites you.

2. Description blockquote (optional but recommended)

A markdown blockquote (> one or two sentences) that explains what the site is. Treat this as your AI elevator pitch. Many engines surface this verbatim.

3. H2-grouped link sections

Each H2 is a topic. Beneath the H2, list bullet-pointed markdown links. You can optionally add a colon-separated description after each link. Order matters - LLMs treat earlier sections as higher priority.

4. Optional extras

The spec leaves room for llms-full.txt (a longer file with full text content for offline indexing) and language tags. The generator emits a v0.1-compliant llms.txt; the longer variant is on our roadmap.

Where to host the file

The file must live at the root of your domain so it is reachable at https://yourdomain.com/llms.txt. The HTTP response must be 200 OK and the content-type should be text/plain or text/markdown. CDNs need to let the file through unmodified - watch out for HTML-only edge rewrites that intercept everything except /api/*.

Hosting cheat-sheet by stack

Stack	Where to put the file	Notes
Next.js	`public/llms.txt`	Served as-is. Works on Vercel, Netlify, DO App Platform.
WordPress	Web root, alongside `wp-config.php`	Disable any plugin that rewrites unknown paths.
Webflow / Framer	Custom code section or asset upload	Both platforms now allow root-level static files.
Shopify	Theme assets + redirect rule	Use a 200 redirect to a hosted text file in `/files`.
Static site (Hugo, Jekyll, Astro)	`static/llms.txt` or `public/llms.txt`	No special config required.
Cloudflare Pages	Build output root	Confirm it is not gzipped to a non-text content-type.

How to validate your llms.txt

Visit https://yourdomain.com/llms.txt in a browser. You should see plain markdown, not a 404 or a HTML page.
Check the response headers in DevTools. content-type should start with text/.
Confirm there is no x-frame-options, content-disposition: attachment or other header that breaks programmatic fetching.
Run the file through a markdown linter (any will do). Broken syntax means LLMs will mis-parse the sections.
Use our AI Crawler Checker against /llms.txt to confirm GPTBot, ClaudeBot and PerplexityBot are not blocked from fetching it.

Tips for a high-signal llms.txt

The default generator output is a strong starting point. The teams that get the most lift from llms.txt do these five things on top.

Lead with money pages. Pricing, the product home, your top comparison page. The first 10 URLs in the file carry the most weight.
Cap each section at 10-20 URLs. AI engines reward density over breadth. A 50-URL Blog section dilutes signal; pick the 10 best posts.
Cut archive and tag pages. Anything that is a list of other links rarely belongs in llms.txt. Spotlight the destinations, not the indexes.
Add one-line descriptions to your top 20 URLs. Format: - [Title](url): one-sentence description. This dramatically improves citation quality.
Re-run after every launch. A new pricing page, a new use-case page, a new comparison - all should be added to llms.txt the same day they ship.

llms.txt vs robots.txt vs sitemap.xml

These three files solve different problems. Most professional sites need all three.

File	Purpose	Audience	Format
`robots.txt`	Exclusion - which paths crawlers must not visit	All crawlers (search + AI)	Plain text directives
`sitemap.xml`	Discovery - the full list of URLs you want indexed	Search engines primarily	XML, machine-generated
`llms.txt`	Curation - the URLs AI should focus on, with structure	LLMs and AI answer engines	Markdown, human-edited

Common mistakes

Treating llms.txt like a sitemap. Dumping every URL defeats the point. Curate ruthlessly.
Wrong content-type. A file served as application/octet-stream may be downloaded by browsers but skipped by some indexers.
Forgetting the H1. Without an H1 title, parsers treat the file as malformed.
Hosting at /llms-txt or /llms. The path is /llms.txt exactly. Aliases do not count.
Blocking GPTBot or ClaudeBot. If your robots.txt blocks the AI crawlers, llms.txt may never be fetched. Check with the AI Crawler Checker.
Letting the file go stale. If your llms.txt still reads "Beta launch coming Q2 2025" in mid-2026, it actively damages trust.

Use cases by team type

SaaS marketing teams use it to spotlight pricing, integrations and use-case pages so AI engines recommend them in buying-intent queries.
Documentation teams use it to give LLM-powered IDEs (Cursor, Continue) a clean entry point into their docs, improving developer experience.
Ecommerce sites use it to highlight category and bestseller pages over the long tail of out-of-stock variants.
Agencies generate llms.txt for every client as part of an AI visibility audit. Pair it with our AI Readiness Audit for a polished deliverable.
Local businesses spotlight service pages and city-specific landing pages so local AI queries surface them.

Glossary

Term	Meaning
llms.txt	Plain-text manifest at `/llms.txt` that curates URLs for LLM consumption.
llms-full.txt	Optional companion file containing full-text content for offline indexing.
GEO	Generative Engine Optimization - the discipline of structuring content so AI engines surface it.
Retrieval-augmented generation (RAG)	Architecture where an LLM fetches live documents to ground its answer.
Sitemap index	An XML file that points to multiple smaller sitemaps. Common on sites with thousands of URLs.

Frequently asked questions

Do AI models actually read llms.txt today?

Yes, with growing support. Anthropic, Cursor, Mintlify, Continue, and a long list of vendors check for the file. Many web indexers (including the corpora that feed retrieval-augmented systems) treat it as a high-priority hint. Even where it is not consumed directly, having a clean manifest forces you to audit your information architecture, which improves AI visibility regardless.

Is llms.txt the same as robots.txt?

No. robots.txt is an exclusion list - it tells crawlers where they may NOT go. llms.txt is an inclusion list - a curated set of URLs you want AI to focus on. Most sites need both: robots.txt to gate access, llms.txt to spotlight the best content, sitemap.xml to give the full picture to search engines.

How often should I regenerate the file?

Regenerate after every meaningful content change: new product page, new comparison page, deprecated docs, etc. A monthly re-run is a sensible cadence for sites that publish weekly. Quarterly is the floor for any site that ships changes.

My sitemap is huge. Will the tool include everything?

The tool caps output at 500 URLs and at most 50 per section. That keeps the file readable for AI and aligned with the spec. Trim further by hand if you want a tighter signal. The goal is curation, not completeness.

Can I edit the output by hand?

Absolutely - and you should. The download is plain markdown. Open it in any editor, reorder sections, sharpen titles, add one-line descriptions to your top 20 URLs. The generator gives you 90% of the file in seconds; the last 10% is editorial work only you can do.

What happens if I have no sitemap.xml?

The generator returns an error. Add a sitemap (most static-site tools and CMSes generate one automatically) and re-run. If you cannot, draft the file by hand starting from the H1 + description + 5-10 H2 sections that match your IA.

Can I use llms.txt to block AI from specific pages?

No - that is robots.txt territory. llms.txt is purely additive. To block AI crawlers, use Disallow rules in robots.txt targeting GPTBot, ClaudeBot, PerplexityBot and Google-Extended.

Should I have a separate llms.txt per language?

The spec is per-domain. If you run separate-domain locales (de.example.com, fr.example.com), generate a per-domain file. If you run sub-path locales (example.com/de/, example.com/fr/), one llms.txt with grouped sections is cleaner.

How does this affect my Google rankings?

No direct effect. llms.txt is read by AI engines, not Google search. Indirect benefit: the discipline of curating your most important pages tends to surface IA problems that hurt traditional SEO too.

What is llms-full.txt and do I need it?

llms-full.txt is an optional companion file containing the full text of every URL in your llms.txt. It is useful for offline indexing in retrieval pipelines. Most public sites do not need one - it is more relevant for internal docs and developer-tool integrations.

Related free tools

AI Crawler Checker

Confirm GPTBot, ClaudeBot and Perplexity can actually reach your URLs.

Open →

GEO Score Checker

Score any page on its AI-readiness in seconds.

Open →

AI Readiness Audit

Full breakdown across 50+ AI-readiness checkpoints.

Open →

Quick answer

What is llms.txt?

The minimum valid file

Why your site needs llms.txt in 2026

“

Most teams discover their sitemap was the problem only after they sit down to write llms.txt. The exercise of choosing 200 URLs that actually represent the business is the highest-leverage information architecture audit you'll do all year.

Nik Sov· Founder, Livesov

How this llms.txt generator works

You enter a root domain. Either example.com or https://example.com works. We strip paths and query strings.
We fetch /sitemap.xml and /sitemap_index.xml using our SSRF-hardened crawler. If your sitemap is an index pointing to nested sitemaps, we follow up to eight levels deep.
Each URL is parsed and categorised by path-prefix into one of 12 buckets: Home, Product, Pricing, Tools, Use Cases, Integrations, Comparisons, Documentation, Blog, Company, Contact, Other.
We render markdown with up to 500 URLs total and at most 50 per section, prioritised in the order most LLMs surface (Home, Product, Pricing first; Other last).
You copy or download the output as a plain text file and upload it to your web root.

The llms.txt spec, explained

The format is small. Here is the full surface area you actually need to know.

1. H1 title (required)

The first non-empty line is an H1 with the project name. This becomes the LLM's primary anchor when it cites you.

2. Description blockquote (optional but recommended)

A markdown blockquote (> one or two sentences) that explains what the site is. Treat this as your AI elevator pitch. Many engines surface this verbatim.

3. H2-grouped link sections

4. Optional extras

Where to host the file

Hosting cheat-sheet by stack

Stack	Where to put the file	Notes
Next.js	`public/llms.txt`	Served as-is. Works on Vercel, Netlify, DO App Platform.
WordPress	Web root, alongside `wp-config.php`	Disable any plugin that rewrites unknown paths.
Webflow / Framer	Custom code section or asset upload	Both platforms now allow root-level static files.
Shopify	Theme assets + redirect rule	Use a 200 redirect to a hosted text file in `/files`.
Static site (Hugo, Jekyll, Astro)	`static/llms.txt` or `public/llms.txt`	No special config required.
Cloudflare Pages	Build output root	Confirm it is not gzipped to a non-text content-type.

How to validate your llms.txt

Visit https://yourdomain.com/llms.txt in a browser. You should see plain markdown, not a 404 or a HTML page.
Check the response headers in DevTools. content-type should start with text/.
Confirm there is no x-frame-options, content-disposition: attachment or other header that breaks programmatic fetching.
Run the file through a markdown linter (any will do). Broken syntax means LLMs will mis-parse the sections.
Use our AI Crawler Checker against /llms.txt to confirm GPTBot, ClaudeBot and PerplexityBot are not blocked from fetching it.

Tips for a high-signal llms.txt

The default generator output is a strong starting point. The teams that get the most lift from llms.txt do these five things on top.

Lead with money pages. Pricing, the product home, your top comparison page. The first 10 URLs in the file carry the most weight.
Cap each section at 10-20 URLs. AI engines reward density over breadth. A 50-URL Blog section dilutes signal; pick the 10 best posts.
Cut archive and tag pages. Anything that is a list of other links rarely belongs in llms.txt. Spotlight the destinations, not the indexes.
Add one-line descriptions to your top 20 URLs. Format: - [Title](url): one-sentence description. This dramatically improves citation quality.
Re-run after every launch. A new pricing page, a new use-case page, a new comparison - all should be added to llms.txt the same day they ship.

llms.txt vs robots.txt vs sitemap.xml

These three files solve different problems. Most professional sites need all three.

File	Purpose	Audience	Format
`robots.txt`	Exclusion - which paths crawlers must not visit	All crawlers (search + AI)	Plain text directives
`sitemap.xml`	Discovery - the full list of URLs you want indexed	Search engines primarily	XML, machine-generated
`llms.txt`	Curation - the URLs AI should focus on, with structure	LLMs and AI answer engines	Markdown, human-edited

Common mistakes

Treating llms.txt like a sitemap. Dumping every URL defeats the point. Curate ruthlessly.
Wrong content-type. A file served as application/octet-stream may be downloaded by browsers but skipped by some indexers.
Forgetting the H1. Without an H1 title, parsers treat the file as malformed.
Hosting at /llms-txt or /llms. The path is /llms.txt exactly. Aliases do not count.
Blocking GPTBot or ClaudeBot. If your robots.txt blocks the AI crawlers, llms.txt may never be fetched. Check with the AI Crawler Checker.
Letting the file go stale. If your llms.txt still reads "Beta launch coming Q2 2025" in mid-2026, it actively damages trust.

Use cases by team type

SaaS marketing teams use it to spotlight pricing, integrations and use-case pages so AI engines recommend them in buying-intent queries.
Documentation teams use it to give LLM-powered IDEs (Cursor, Continue) a clean entry point into their docs, improving developer experience.
Ecommerce sites use it to highlight category and bestseller pages over the long tail of out-of-stock variants.
Agencies generate llms.txt for every client as part of an AI visibility audit. Pair it with our AI Readiness Audit for a polished deliverable.
Local businesses spotlight service pages and city-specific landing pages so local AI queries surface them.

Glossary

Term	Meaning
llms.txt	Plain-text manifest at `/llms.txt` that curates URLs for LLM consumption.
llms-full.txt	Optional companion file containing full-text content for offline indexing.
GEO	Generative Engine Optimization - the discipline of structuring content so AI engines surface it.
Retrieval-augmented generation (RAG)	Architecture where an LLM fetches live documents to ground its answer.
Sitemap index	An XML file that points to multiple smaller sitemaps. Common on sites with thousands of URLs.

Frequently asked questions

Do AI models actually read llms.txt today?

Is llms.txt the same as robots.txt?

How often should I regenerate the file?

My sitemap is huge. Will the tool include everything?

Can I edit the output by hand?

What happens if I have no sitemap.xml?

Can I use llms.txt to block AI from specific pages?

No - that is robots.txt territory. llms.txt is purely additive. To block AI crawlers, use Disallow rules in robots.txt targeting GPTBot, ClaudeBot, PerplexityBot and Google-Extended.

Should I have a separate llms.txt per language?

How does this affect my Google rankings?

No direct effect. llms.txt is read by AI engines, not Google search. Indirect benefit: the discipline of curating your most important pages tends to surface IA problems that hurt traditional SEO too.

What is llms-full.txt and do I need it?