IndexDoctor.io
AI visibility

GPTBot blocked by robots.txt

GPTBot is OpenAI's training crawler. Blocking it is a real policy choice, but it does not remove your site from ChatGPT by itself.

What this usually means

Your robots.txt contains a Disallow rule that targets GPTBot (either under User-agent: GPTBot or the catch-all User-agent: *). When GPTBot fetches robots.txt and evaluates your URL, it chooses the most specific matching group and honors the Disallow, so those URLs are not fetched for training.

Why it matters

GPTBot is OpenAI's crawler for training future models. It is different from OAI-SearchBot, which powers ChatGPT's search index, and from ChatGPT-User, which fetches URLs on demand when a user references them. Blocking GPTBot is an opt-out from training. It does not by itself opt you out of live citations in ChatGPT or of ChatGPT browsing. Many publishers confuse these and block the wrong user-agent.

Common causes
  • A copy-pasted "block AI" robots.txt template bundled GPTBot, OAI-SearchBot, and ChatGPT-User into one Disallow block.
  • A broad User-agent: * Disallow: / rule effectively blocks every AI crawler, including GPTBot, without an explicit allow list for search crawlers.
  • A migration from staging left a blanket Disallow in production robots.txt.
  • A CDN or WAF blocks the GPTBot user-agent at the edge, which overrides robots.txt and returns 403.
  • A policy decision was made intentionally, but without a plan for OAI-SearchBot and PerplexityBot, which drive live citations.
How to diagnose it
  1. Open AI Crawler Checker and paste the page URL.
  2. Check the per-crawler matrix row for GPTBot.
  3. Look at whether the matched group is User-agent: GPTBot, a wildcard, or the catch-all *.
  4. Confirm that your edge/CDN is not also returning 403 for GPTBot.
  5. Compare GPTBot, OAI-SearchBot, and ChatGPT-User side by side to see the current policy.
How to fix it
  1. 1

    Decide training vs. live retrieval separately

    If your goal is to opt out of training, keep GPTBot disallowed, but add explicit Allow groups for OAI-SearchBot so live answers can still cite you.

  2. 2

    Use explicit user-agent groups

    Add a User-agent: GPTBot group instead of relying only on User-agent: *. This makes the policy auditable and avoids accidentally covering newer crawlers.

  3. 3

    Align CDN/WAF rules with robots.txt

    If your CDN blocks AI user-agents at the edge, that overrides robots.txt. Decide once and apply the same policy in both places.

  4. 4

    Re-check after each robots.txt change

    Run AI Crawler Checker against your most important pages every time you change robots.txt or WAF rules. The matrix view tells you, per crawler, whether you are blocking exactly what you intended.

FAQ
Is GPTBot the same as ChatGPT-User?

No. GPTBot is OpenAI's training crawler. ChatGPT-User is a per-request user-triggered agent that fetches URLs a user referenced in a conversation. Blocking one does not block the other.

Will blocking GPTBot remove my site from ChatGPT?

Not directly. GPTBot controls whether your pages can be used to train future models. Live ChatGPT answers use OAI-SearchBot (for search) and ChatGPT-User (for user-triggered browsing). To control live citations you need to also control those user-agents.

Can I allow OAI-SearchBot while blocking GPTBot?

Yes. Add a User-agent: GPTBot group with Disallow: / and a separate User-agent: OAI-SearchBot group with Allow: /. This opts out of training while keeping your site available to ChatGPT's search-backed answers.

Related fixes

Ready to diagnose your URL?

AI Crawler Checker runs the exact checks discussed above.

Run AI Crawler Checker