Definition
Every major AI crawler respects robots.txt. Blocking them opts you out of training and (in some cases) retrieval. Make sure the major AI agents are explicitly allowed - see our free AI Crawler Checker.
Why it matters
robots.txt sits in the "Crawlers & infrastructure" layer of the AI search stack. Teams that handle it well get cited more, recommended more, and earn more of the AI-mediated revenue in their category. Teams that ignore it spend a year wondering why their content investment never moves the needle inside ChatGPT or Perplexity.
Related terms
- llms.txt - A proposed standard file at the root of a site that tells LLM crawlers what content is available, in what format, and how to use it.
- GPTBot - OpenAI's crawler for ChatGPT training data. Identified by the user agent "GPTBot".
- ClaudeBot - Anthropic's web crawler for Claude training and (in some configurations) retrieval.
- PerplexityBot - Perplexity's live-retrieval crawler. Blocking it removes you from Perplexity citations.
- Google-Extended - An opt-in agent that lets Google use your content for Gemini training. Separate from Googlebot, which is required for classic search.
Apply it
The LLM SEO playbook ties every concept in this glossary into a single operating model. If you want to see how your brand performs across all the LLMs at once - mention rate, citation share, sentiment, rank - start with the free GEO audit or skip straight to a free Livesov account.