Sitemap returns HTML instead of XML
Your sitemap looks fine in a browser tab, but Google won't accept it. Almost always, your origin is silently returning HTML for a URL Google expects to be XML.
When a sitemap URL responds with HTML, a login page, an app shell, a 404 template, or any other rendered HTML, Google's parser sees no <urlset> or <sitemapindex> and discards the response. From your browser, the URL might look perfectly normal.
This is one of the most common causes of "Couldn't fetch" or "Sitemap could not be read" in Search Console. Until the response is real XML, none of the URLs in that sitemap will reach Google through the sitemap channel.
- A single-page application catches /sitemap.xml and serves the app shell.
- A wildcard rewrite or framework router renders a fallback HTML page for unknown routes.
- A CDN serves a custom 404 or 403 page (HTML) instead of a real status code.
- A login wall or maintenance page intercepts /sitemap.xml.
- An auth proxy redirects to a sign-in page that returns HTML.
- A browser extension or service worker rewrites the response, but the server itself is fine.
- Use Sitemap Checker to fetch the URL from a clean server-side request.
- Inspect the Content-Type header. If it starts with text/html, that's your bug.
- Look at the first few bytes of the response. <!doctype html>, <html, or any HTML tag means it's not a sitemap.
- Try curl -I and curl with a User-Agent of Googlebot to rule out user-agent specific behavior.
- 1
Stop the framework from catching the sitemap route
Add an explicit route or rewrite rule for /sitemap.xml (and any child sitemaps) so the framework's catch-all does not intercept it.
- 2
Set Content-Type to application/xml
Whatever generates the response, Next.js route, Express handler, Nginx, S3 metadata, should set Content-Type to application/xml or text/xml.
- 3
Fix custom 404 and 403 pages on the CDN
If the file is missing, return a real 404 status with an empty body or a tiny text body. Don't return 200 with an HTML error page.
- 4
Bypass auth for sitemap URLs
If the site is behind a login wall or password, exempt /sitemap*.xml so search engines and your own diagnostic tools can fetch it without redirects.
- 5
Re-test from a server-side fetch
Don't trust the browser. Use Sitemap Checker again to confirm Content-Type and body shape from a server perspective before resubmitting.
Why does my browser show a page but Google sees invalid XML?
Browsers tolerate broken responses and try to render whatever the server returns. Google's sitemap parser is strict: it needs application/xml-ish content with a recognized sitemap root element.
Is text/html always bad for a sitemap?
Yes. Even if the body is valid XML, sending text/html tells Google to treat the response as a page, not a sitemap. Use application/xml or text/xml.
Can browser extensions affect sitemap previews?
They can affect what you see in the browser, pretty-printers, dev tools, JSON viewers, but they have no impact on what Google sees. Always validate from a clean server-side fetch.
Ready to diagnose your URL?
Sitemap Checker runs the exact checks discussed above.