ClaudeBot blocked by robots.txt
Anthropic runs more than one crawler. Blocking ClaudeBot is an opt-out from training, not from live Claude citations.
Your robots.txt disallows ClaudeBot, either explicitly or via a catch-all. ClaudeBot honors robots.txt, so Anthropic's training pipeline will not fetch the URLs it matches. The block may also be incidentally blocking Claude-SearchBot and Claude-User, depending on how the rules are written.
Anthropic separates its crawlers by purpose. ClaudeBot is used for training. Claude-SearchBot powers retrieval when Claude answers questions with web context. Claude-User is a per-request user-triggered fetcher. A broad block can unintentionally cover all three, which removes Claude's ability to cite you at all, not just to train on you.
- A blanket User-agent: * Disallow: / rule includes every Claude agent.
- An "AI block" template lists ClaudeBot without also distinguishing Claude-SearchBot or Claude-User.
- A CDN rule blocks anything matching /claude/i at the edge, including user-triggered fetches.
- Staging robots.txt leaked to production with a global Disallow.
- The block is intentional but not documented, so new pages inherit it silently.
- Open AI Crawler Checker.
- Inspect the rows for ClaudeBot, Claude-SearchBot, and Claude-User separately.
- Note which robots.txt group each one matched and whether the rule was User-agent specific or catch-all.
- Confirm the page itself is 200 with real server-rendered content, so allowed Claude crawlers would actually have something to parse.
- 1
List Anthropic crawlers explicitly
Add separate User-agent groups for ClaudeBot, Claude-SearchBot, and Claude-User. Use the one policy that matches your intent for each.
- 2
Separate training from retrieval
To opt out of training only, disallow ClaudeBot and allow Claude-SearchBot and Claude-User. Claude can then cite you without training on you.
- 3
Check the edge stack
Edge rules that block by regex can catch more than you want. Audit your CDN, WAF, and bot-management configs.
- 4
Confirm with AI Crawler Checker
After changes, re-run the checker. The matrix should show allowed/disallowed exactly as you intended for each Anthropic agent.
Which Claude crawler should I allow?
If you want Claude to cite your content, allow Claude-SearchBot and Claude-User. ClaudeBot is specifically for training, allowing it is a separate question about whether your content is used to improve future models.
Does robots.txt guarantee privacy?
No. robots.txt is a politeness signal that Anthropic's crawlers honor, but it is not a privacy mechanism. Content that needs to be private should be behind authentication, not just disallowed in robots.txt.
Can I block training but allow search?
Yes. Disallow ClaudeBot and allow Claude-SearchBot. That lets Claude cite your pages in answers without using them to train future models.
Ready to diagnose your URL?
AI Crawler Checker runs the exact checks discussed above.