Fix common indexing problems

Plain-English guides for the indexing, crawling, canonical, redirect, social-preview, and AI-visibility problems IndexDoctor diagnoses every day.

Sitemap
Google can't fetch sitemap: what it means and how to fix it
Search Console marks your sitemap as "Couldn't fetch." Here's what Google is actually telling you and the fastest way to verify it.
Sitemap
Sitemap returns HTML instead of XML
Your sitemap looks fine in a browser tab, but Google won't accept it. Almost always, your origin is silently returning HTML for a URL Google expects to be XML.
Crawling
Page blocked by robots.txt: why and what to do
robots.txt looks simple, but a single Disallow line in the wrong group can keep entire sections of your site out of Google's crawl.
Indexing
Page has noindex: why search engines won't index it
If a page tells Google "don't index me," Google obeys. The hard part is finding which layer of your stack added the directive.
Canonical
Canonical points elsewhere: when Google may choose another URL
A canonical tag is a hint, not a directive, but it's the strongest hint you can give Google about which URL is the preferred version.
Redirects
Redirect chain too long: why it hurts crawling and UX
Every redirect hop is a small cost. Stack three or four of them on a key URL and you're losing time, signals, and sometimes rankings.
Social
OG image not showing when sharing a link
Social previews are unforgiving. One missing tag, one wrong dimension, or one cached image and your link looks broken everywhere it's shared.
AI visibility
AI crawlers blocked: what it means for AI search visibility
Blocking AI crawlers in robots.txt is a real choice, but it's worth understanding what you're actually opting out of, and how to verify the block is doing what you think.
GEO
Page not LLM-readable: how to make content easier for AI systems to understand
AI crawlers can be allowed in robots.txt and still walk away with nothing useful. Most of the time the problem is not access, it's structure.
AI visibility
GPTBot blocked by robots.txt
GPTBot is OpenAI's training crawler. Blocking it is a real policy choice, but it does not remove your site from ChatGPT by itself.
AI visibility
OAI-SearchBot blocked: what it may mean for AI search visibility
OAI-SearchBot is the crawler that builds OpenAI's search index. If you block it, ChatGPT's search-backed answers will struggle to find you.
AI visibility
ClaudeBot blocked by robots.txt
Anthropic runs more than one crawler. Blocking ClaudeBot is an opt-out from training, not from live Claude citations.
AI visibility
PerplexityBot blocked by robots.txt
Perplexity is citation heavy. Blocking PerplexityBot directly removes you from its answer engine.
AI visibility
Google-Extended blocked: what it does and does not affect
Google-Extended is the cleanest example of "training vs. search" separation. You can block it without losing Google Search ranking.
Canonical
Canonical points to a noindex page
Pointing a canonical at a noindex URL tells Google two contradictory things at once. Pick one direction and remove the conflict.
Canonical
Canonical points to a redirected URL
Canonical tags should point at a URL that returns 200. Redirect targets hide the real canonical behind an extra hop.
Social
OG image too small for social previews
Social platforms have opinions about OG image sizes. Miss the recommended dimensions and your preview looks broken.