Definition
GPTBot crawls the open web to gather training data for OpenAI's GPT model family. Blocking GPTBot in robots.txt removes your site from future ChatGPT training datasets, which suppresses mention rate over time on the default ChatGPT model.
Why it matters
GPTBot sits in the "Crawlers & infrastructure" layer of the AI search stack. Teams that handle it well get cited more, recommended more, and earn more of the AI-mediated revenue in their category. Teams that ignore it spend a year wondering why their content investment never moves the needle inside ChatGPT or Perplexity.
Related terms
- OAI-SearchBot - OpenAI's live-retrieval crawler for ChatGPT Search. Separate from GPTBot and must be allowed independently.
- robots.txt - The decades-old file telling web crawlers what they can fetch. Used to allow or block GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and others.
- Training corpus - The dataset an LLM was trained on. Brands that appear frequently and consistently in the training corpus are recalled by name in answers, with no live retrieval required.
- LLM SEO - Optimizing for Large Language Models - making sure ChatGPT, Claude, Gemini, Perplexity, and Grok know about, cite, and recommend your brand.
Apply it
The LLM SEO playbook ties every concept in this glossary into a single operating model. If you want to see how your brand performs across all the LLMs at once - mention rate, citation share, sentiment, rank - start with the free GEO audit or skip straight to a free Livesov account.