How to Create XML Sitemaps: Step-by-Step Guide
Practical step-by-step instructions to create, validate, submit, and maintain XML sitemaps so search engines index your site reliably.

Creating clear XML sitemaps helps search engines find and index the pages you want ranked. This guide explains how to create XML sitemaps, validate them, submit them to Google and Bing, and keep them up to date so indexing matches your site structure and publishing cadence. You’ll learn practical checks (50,000-URL limits, 50MB uncompressed limit), tools for generation, validation steps, submission walkthroughs, and maintenance patterns for sites of any size.
TL;DR:
-
Create a sitemap that includes only indexable canonical URLs and split with sitemap index files when you exceed 50,000 URLs or 50MB uncompressed.
-
Validate XML for UTF-8 encoding, absolute URLs, and ISO 8601 lastmod; test with Search Console and an XML validator before submitting.
-
Submit to Google and Bing, monitor coverage reports weekly, and automate updates via CMS hooks or scheduled rebuilds to avoid stale sitemaps.
Step 1: Gather Prerequisites — What You Need Before Creating a Sitemap
Before you produce anything, collect the rules and goals that define which URLs belong in the sitemap. A sitemap matters because it signals to search engines which pages you want crawled and indexed; it does not override robots.txt or canonical tags, but it helps prioritize crawl discovery.
Checklist of things to gather:
-
Canonicalization rules: site-wide canonical tag patterns and any canonical-to-other-domain rules.
-
Noindex lists: pages or sections explicitly set to noindex (search filters, admin pages, development URLs).
-
Pagination and parameter rules: how paginated lists are handled (rel="next/prev" or canonical to page 1) and whether parameterized URLs are canonical.
-
Crawl budget considerations: expected crawl frequency for large sites and heavy server load times.
-
URL inventory: spreadsheet or live index of current pages with status codes and canonical targets.
Key specs to keep in mind:
-
A single sitemap file can contain up to 50,000 URLs and must be under 50MB uncompressed; for larger sites use sitemap index files that point to multiple sitemap files. See the official protocol at sitemaps.org protocol.
-
Robots.txt should point to your sitemap(s) with absolute paths so crawlers can find them automatically.
Practical ways teams keep the inventory updated:
-
Use a site audit tool to export URL lists and discover noindex or redirect patterns.
-
Keep a content inventory (pillar and cluster inventory) to decide which pages are mission-critical for the sitemap; this helps prioritize high-value landing pages and reduces sitemap bloat. For organizing that inventory, see the guide on pillar and cluster inventory.
Step 2: Choose Sitemap Format and Scope (XML, RSS, or Sitemap Index)
Decide whether to use a standard XML sitemap, an RSS/Atom feed for frequently updated content, or a sitemap index for large sites.
When to pick each format:
-
XML sitemap: Best for most sites. It explicitly supports
, , , and fields and follows the protocol at sitemaps.org protocol. -
RSS/Atom feeds: Useful for news or frequently-updated blogs where crawl rate should favor the freshest items. Search engines can consume RSS for discovery but XML sitemaps give more control.
-
Sitemap index: Required when you have over 50,000 URLs or a single sitemap would exceed 50MB uncompressed. Group sitemaps by content type, section, or date ranges to simplify monitoring.
Sitemap metadata recommendations:
-
Lastmod: Use ISO 8601 timestamps (YYYY-MM-DD or YYYY-MM-DDThh:mmTZD). Google recommends accurate lastmod values—avoid stamping everything with the current date.
-
Changefreq: Optional. Use sparingly; it's a hint, not a directive. One or two values are often sufficient (e.g., "daily" for blog home page, "monthly" for evergreen docs).
-
Priority: Many teams omit priority because search engines rarely use it heavily; if you use it, keep values between 0.1 and 1.0 and align them with your content inventory priorities.
URL selection rules:
-
Only include canonical, indexable pages (200 OK, not blocked by robots.txt, and not marked noindex).
-
Avoid listing parameterized URLs unless they are canonical and valuable.
If you're unsure about format choices, the Google Search Central overview provides clear examples and guidance about fields and limits at Google's sitemap guide.
Step 3: Generate the Sitemap — Automated Tools and Manual Methods (how to Create Xml Sitemaps)
There are three common approaches to generating sitemaps: CMS-managed, crawler-based, and programmatic endpoints. Choose based on site size, publishing velocity, and engineering resources.
1) Generate with CMS (WordPress, headless CMS)
-
WordPress: Install a well-supported plugin (for example, widely used plugins expose a /sitemap.xml endpoint). Configure it to exclude noindex content types and to use post modified dates for lastmod.
-
Headless CMS: Many headless platforms provide webhook hooks to trigger a rebuild of a sitemap on publish/unpublish events. If not, schedule a server-side job to regenerate sitemap files nightly.
-
Pros: Low engineering effort, automatic updates on publish. Cons: May require plugin configuration and QA to exclude drafts and staging URLs.
-
When using a CMS, ensure the sitemap URL is stable (e.g., /sitemap.xml) and cache it to reduce load.
2) Crawl-and-generate (Screaming Frog, Sitebulb, custom scripts)
-
Steps: Crawl the site, export a CSV of crawled URLs, filter to indexable pages, then transform into XML using a simple script or the tool’s sitemap export feature.
-
Screaming Frog and Sitebulb can produce sitemaps and respect robots.txt and canonical rules if configured properly.
-
Pros: Good for one-off audits and validating what actually exists on the live site. Cons: Not ideal for high-frequency updates; manual steps required.
3) Programmatic generation for large sites
-
Pattern: database → filter for indexable pages → generate XML fragments → serve via a sitemap endpoint or save as static files behind a CDN. Use a sitemap index to reference multiple shard sitemaps.
-
Implementation tips: Cache generated sitemaps and serve with proper caching headers (Cache-Control). Use pagination of sitemap files (e.g., /sitemap-posts-1.xml) and a /sitemap-index.xml that lists them.
-
Pros: Scales well and stays fresh. Cons: Requires engineering to implement proper caching and to ensure the generation job respects canonical rules and noindex lists.
Avoid including non-indexable URLs and duplicate parameterized links. When building programmatic sitemaps, review programmatic strategies so the sitemap structure reflects content intent; for help deciding structure for large sets of pages, see our post on programmatic comparison pages and programmatic SEO strategies. If your site uses AI-driven content or frequent batch publishing, review AI for new sites and AI vs ChatGPT to understand how update frequency affects your sitemap needs. Also evaluate tools that maintain a live URL index when choosing automation; see what to look for in an AI SEO tool.
Step 4: Validate and Test Your Sitemap Before Publishing
Testing prevents simple errors from blocking indexing. Run multiple checks before you point Search Console at a sitemap.
Validation steps:
-
XML well-formedness: Run the sitemap through an XML validator or the W3C Markup Validation Service to confirm correct tags and UTF-8 encoding.
-
URL formatting: Ensure all
values are absolute URLs (including protocol and domain) and return 200 status codes for canonical pages. -
Lastmod encoding: Verify lastmod timestamps follow ISO 8601 and reflect the content change date rather than the sitemap generation date.
-
Encoding: Confirm the file is UTF-8 encoded with no byte-order-mark (BOM).
Useful tools:
-
W3C validator: validate XML structure and encoding at W3C markup validation service.
-
Search Console’s sitemap upload will also report parsing errors when you add a sitemap.
QA Checklist for Common Sitemap Issues:
-
Noindex detection: Confirm the sitemap excludes pages with meta noindex or X-Robots-Tag noindex headers. Use your content QA process to catch accidental inclusions; see our content QA process.
-
Canonical mismatch: Compare the sitemap URL against the page canonical; if canonical points elsewhere, remove or correct the sitemap entry.
-
Redirects and soft 404s: Remove URLs that redirect or return soft 404s. Run a subset through a crawler to verify.
-
File size and broken shards: Confirm each sitemap shard is under 50MB uncompressed and under 50,000 URLs.
For teams using automated publishing pipelines, incorporate a runnable publishing QA checklist before sitemap updates are pushed; our publishing QA checklist shows a practical workflow. Test a small sitemap first (e.g., a single section) and confirm it parses and is accepted in Search Console before submitting larger indexes.
Step 5: Submit Sitemaps to Search Engines and Monitor Indexing
Submission is simple, but monitoring is the ongoing task that uncovers indexing issues.
How to Submit in Google Search Console:
-
Open the correct property (domain property preferred).
-
Go to Index → Sitemaps.
-
Enter the sitemap path (for example, sitemap-index.xml or sitemap.xml) and click Submit.
-
Check the Sitemaps report for parsing status, discovered URLs, and errors.
After submission, use the Coverage report to interpret the status. Distinguish between:
-
Indexed: pages accepted and indexed.
-
Discovered — currently not indexed: found but not yet crawled or not selected for indexing.
-
Submitted but blocked by robots.txt: indicates the sitemap contains URLs blocked from crawling.
For a visual demonstration, check out this video on submit a sitemap to google search console:
Submitting to Bing and Other Engines:
-
Bing Webmaster Tools has a Sitemap submission form; follow the steps at How to submit sitemaps to Bing.
-
For other search engines, follow their webmaster tool submission flows or ensure robots.txt lists your sitemap so crawlers can discover it.
Monitoring cadence and metrics:
-
Check coverage and sitemap errors weekly for active sites; monthly may be enough for smaller, low-change sites.
-
Track metrics: number of submitted URLs, number of indexed URLs, crawl errors, server response times, and last fetch dates.
-
Use analytics to measure the impact of sitemap changes on organic discovery and indexing trends; see our SEO analytics guide for how to correlate sitemap updates with indexation and traffic.
When Search Console reports errors, prioritize fixes: broken XML, blocked URLs, or canonical conflicts. After you fix issues, resubmit the sitemap in Search Console and watch for status changes over the next few days.
Step 6: Maintain and Automate Sitemap Updates
Sitemaps must reflect the live site. Decide on update rules that balance freshness and server cost.
Update frequency and automation rules:
-
CMS-driven sitemaps: Trigger updates on publish/unpublish events so new pages appear quickly.
-
Scheduled rebuilds: For mid-size sites, nightly rebuilds reduce complexity and avoid transient errors.
-
On-demand endpoints: For very large sites with frequent changes, provide dynamic endpoints that generate sitemaps on request but cache results to limit load.
When to Regenerate vs Incremental Updates:
-
Regenerate full sitemaps after large structural changes (site migration, taxonomy changes).
-
Use incremental updates for content additions or deletions — add or remove entries in the affected sitemap shard and update the lastmod for the shard in the sitemap index.
Automation best practices:
-
Use sitemap index files to shard by content type or date (e.g., posts-YYYY-MM.xml) so you can update a shard without touching others.
-
Log sitemap generation events and submissions for auditability. Keep a history of the sitemap index to compare submitted vs actual pages.
-
Ping search engines when you publish major batches (Google supports a sitemap ping URL) to encourage recrawl.
Maintenance tips:
-
Reconcile sitemap counts with indexed counts periodically. Significant discrepancies can indicate crawling barriers or indexation quality issues.
-
For local business pages, keep listings and citation-linked landing pages in sync with sitemaps; see our guidance for local citations.
-
If deciding between manual upkeep or tooling, review trade-offs in hire writers or use tools and read about automation limits to avoid over-automation without QA.
Troubleshooting: Common Sitemap Mistakes and How to Fix Them
Here are frequent problems and concrete fixes.
Sitemap Contains Non-indexable Urls
-
Symptom: Search Console shows many "submitted but blocked" or "submitted — not indexed."
-
Fix: Crawl a sample of the sitemap entries. Remove pages with meta noindex, X-Robots-Tag noindex headers, or robots.txt blocks. Rebuild and resubmit.
Large Sitemap Files and Server Timeouts
-
Symptom: Sitemap fetch fails or times out; server logs show long processing times.
-
Fix: Split into multiple shard files under 50,000 URLs each. Serve pre-generated static sitemap files from a CDN and set cache headers. If using a dynamic endpoint, add server-side caching to reduce load.
Incorrect Lastmod or Wrong Canonical Urls
-
Symptom: Search Console reports canonical mismatch or lastmod dates appear unrelated to content changes.
-
Fix: Align lastmod to the actual content update date, not the sitemap generation time. Ensure sitemap entries use the canonical URL for each page. Run a comparison between sitemap entries and live canonical tags.
Search Console shows "Discovered — currently not indexed"
-
Symptom: URLs appear in sitemaps but are not getting indexed.
-
Diagnostic steps: Check content quality signals (thin content, duplicate templates), check for crawl budget constraints, and examine server performance during crawls.
-
Fixes: Improve page content, consolidate duplicates, and use internal linking to help crawlers find important pages. Monitor after fixing; if many pages are affected, prioritize high-value pages first.
Programmatic sitemap pitfalls
-
Symptom: Large auto-generated sections lead to thin pages being indexed or sitemap churn with low-value URLs.
-
Fix: Apply stricter filters when programmatically generating sitemaps. Review programmatic strategies and common failure modes in why programmatic SEO fails. Ensure your generation logic respects canonical rules and content QA.
Quick diagnostic checklist:
-
Crawl 100 random sitemap URLs and inspect HTTP status, canonical tags, and meta robots.
-
Compare sitemap vs robots.txt for contradictions.
-
Review server logs for sitemap fetch requests and response codes.
-
Re-submit and monitor changes for at least 72 hours.
The Bottom Line
Creating and maintaining XML sitemaps is a one-time setup plus ongoing monitoring task: build sitemaps that list only canonical, indexable URLs, validate them thoroughly, submit to Search Console and Bing, and automate updates with sensible caching and QA. For immediate action, prioritize getting a valid sitemap.xml in place, submit it to Google, and check coverage reports weekly to catch issues early.
Frequently Asked Questions
How often should I update my sitemap?
Update frequency depends on how often your content changes. For news-heavy or high-velocity sites, update the sitemap on publish (or use an RSS feed for discovery). For documentation or evergreen content, nightly or weekly rebuilds are usually sufficient. Monitor the Search Console coverage report after changes to ensure the update pattern is working.
Can I include parameterized URLs in a sitemap?
You can include parameterized URLs only if they are canonical and return unique, indexable content. If parameters create the same content as a canonical path, exclude them to avoid duplicate indexing. Use URL inspection and a crawler sample to confirm which parameterized URLs are serving indexable content.
What does "Discovered — currently not indexed" mean?
This status means the search engine knows the URL exists (from the sitemap or links) but has not indexed it yet—often due to crawl prioritization, perceived low content value, or temporary crawl limits. Check page quality, internal linking, and server response times; prioritize fixes for high-value pages and resubmit the sitemap after corrections.
Do I need both XML sitemaps and RSS feeds?
Not always. XML sitemaps are the standard for site indexing control and should be your baseline. RSS or Atom feeds are useful for sites with very frequent updates because they can help search engines discover fresh content faster. Use both if you need fresh-discovery signals plus structured indexing control.
External resources used in this article:
-
How to submit sitemaps to Bing
-
W3C markup validation service
Related Articles

How to Improve Page Speed: Step-by-Step Guide
A practical, step-by-step guide to measuring, optimizing, and monitoring page speed using Core Web Vitals, tooling, and automation.

How to Update Old Content for SEO: Step-by-Step Guide
A practical step-by-step guide to auditing, updating, and republishing old content so it ranks better and drives more organic traffic.

How to Create Cornerstone Content: Step-by-Step Guide
Practical, actionable steps to plan, write, publish, and scale cornerstone content that drives organic growth.

How to Structure Content for SEO: Step-by-Step Guide
A practical step-by-step guide to structuring content for SEO: keyword clustering, pillar pages, on-page optimization, internal linking and scalable publishing.

How to Optimize Content for Featured Snippets: Step-by-Step Guide
A practical, step-by-step guide to structure, format, and test content that wins featured snippets — with reproducible tactics for scaling.

How to Write Pillar Pages: Step-by-Step Guide
Step-by-step instructions for planning, writing, and publishing SEO-ready pillar pages that drive organic growth and scale content production.
Ready to Scale Your Content?
SEOTakeoff generates SEO-optimized articles just like this one—automatically.
Start Your Free Trial