Definition
Google-Extended is the agent Google uses to source training data for Gemini and related AI products. Allowing it does not affect classic Google rankings. Blocking it removes you from future Gemini training datasets but keeps you indexable for Google Search and AI Overviews retrieval.
Why it matters
Google-Extended sits in the "Crawlers & infrastructure" layer of the AI search stack. Teams that handle it well get cited more, recommended more, and earn more of the AI-mediated revenue in their category. Teams that ignore it spend a year wondering why their content investment never moves the needle inside ChatGPT or Perplexity.
Related terms
- Gemini - Google's multimodal LLM family. Powers Gemini app, AI Overviews, AI Mode, Workspace AI, and the Vertex AI / AI Studio APIs.
- robots.txt - The decades-old file telling web crawlers what they can fetch. Used to allow or block GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and others.
- Training corpus - The dataset an LLM was trained on. Brands that appear frequently and consistently in the training corpus are recalled by name in answers, with no live retrieval required.
Apply it
The LLM SEO playbook ties every concept in this glossary into a single operating model. If you want to see how your brand performs across all the LLMs at once - mention rate, citation share, sentiment, rank - start with the free GEO audit or skip straight to a free Livesov account.