We use cookies to improve your experience and analyze our traffic. By clicking "Accept", you consent to our use of cookies.

Back to Tools
Sitemap & Robots.txt

Sitemap Analyzer
& Robots.txt Checker

Analyze your sitemap.xml and robots.txt. Check crawl rules, count indexed URLs, and find configuration issues.

Robots.txt Check
100% Free
Full Analysis

Sitemap FAQ

What is a sitemap.xml file?
A sitemap.xml is a file that lists all the important URLs on your website. It helps search engines discover, crawl, and index your pages more efficiently.
What is robots.txt?
robots.txt is a file that tells search engine crawlers which pages they can or cannot access. It helps manage crawl budget and prevent indexing of private pages.
Do I need both a sitemap and robots.txt?
Yes! While not strictly required, having both is considered best practice. Your robots.txt should reference your sitemap URL.
How many URLs can a sitemap have?
A single sitemap can have up to 50,000 URLs and must be under 50MB. For larger sites, use a sitemap index that references multiple sitemaps.

Want the full picture?

Run a comprehensive AI readiness audit including SEO, content helpfulness, schema, trust signals, and 50+ more factors.

Run Full AI Audit

About this tool

Sitemaps fail in two directions, and most analyzers only catch one. The obvious failure is malformed XML or a sitemap that returns a 500. The subtler failure — and the one that kills crawl budget — is a perfectly valid sitemap full of URLs that shouldn't be there: noindex pages, duplicates, redirected URLs, faceted navigation, and old URLs from a migration nobody finished.

This analyzer fetches your sitemap.xml and your robots.txt, cross-references them, and reports the actual problems: how many URLs are listed, how many of those are reachable (non-404), how many are noindex (and therefore wasting your sitemap budget), how recent the lastmod dates are, and whether your robots.txt actually points at your sitemap. We also flag the "every URL has lastmod = today" anti-pattern, which trains Google to ignore your timestamps.

Sitemaps don't get pages indexed. Submitting a URL via sitemap is a hint — Google will only index pages it judges worth indexing regardless of submission. If you have pages in your sitemap that aren't getting indexed, the answer is almost never "submit again." It's "the page is too thin, too duplicate, or too low-authority for Google to bother."

How to analyze any site's sitemap

Scan a sitemap.xml for size, freshness, and structural issues.

  1. 1
    Paste your URL

    Enter your full website URL into the input field at the top of the tool.

  2. 2
    Run the scan

    Click the scan button. The tool fetches your page and runs the analysis in the background.

  3. 3
    Review results

    Review the report. Each finding includes a plain-English explanation and a recommended fix.

  4. 4
    Apply fixes

    Implement the recommended changes on your site, then re-run the scan to confirm the issue is resolved.

Frequently asked questions

What does the analyzer check?+
Sitemap presence, format validity, URL count, lastmod accuracy, priority distribution, robots.txt cross-reference, and orphan-page risk (URLs in sitemap but not linked internally).
How big can a sitemap be?+
A single sitemap holds up to 50,000 URLs or 50 MB uncompressed. Larger sites should use a sitemap index that references multiple child sitemaps.
Should I include every page in the sitemap?+
Only canonical, indexable, high-quality pages. Including thin pages, duplicates, or noindex URLs wastes crawl budget and can hurt site quality signals.
How often should lastmod be updated?+
Only when the page actually changes. Setting every URL's lastmod to today's date trains Google to ignore your timestamps entirely.

Related resources