
What Is a Sitemap and How Do You Create One for SEO?
A sitemap is one of the simplest and most effective SEO tools — and one of the most commonly neglected. This guide explains what XML and HTML sitemaps do, how to create them, and how to use them to improve Google indexation.
The Simple File That Helps Google Find Everything on Your Site
Search engine optimization has many complex, time-consuming components: building backlinks, creating comprehensive content, optimizing page speed, implementing structured data. And then there are the simple, quick wins — the basic technical SEO fundamentals that take an hour to implement and provide ongoing benefits for the life of the site.
A properly configured and submitted XML sitemap is one of those quick wins. It takes 30 minutes to implement correctly, it's the right thing to do for any website that wants good Google indexation, and yet a surprising percentage of websites either don't have one, have one that's never been submitted to Google, or have one full of URLs that shouldn't be in it.
This guide covers what sitemaps are, the two types (XML and HTML) and their different purposes, how to create each, how to submit your XML sitemap to Google, and the common mistakes that make sitemaps less effective than they should be.
What a Sitemap Is
A sitemap is a file that provides information about the pages, videos, and other files on your site and the relationships between them. Search engines like Google read sitemaps to help crawl your site more efficiently.
There are two distinct types of sitemaps that serve different purposes:
XML Sitemap: A machine-readable file specifically for search engines. It lists URLs with optional metadata — last modified date, update frequency, and priority. Located at yoursite.com/sitemap.xml. Submitted to Google through Google Search Console.
HTML Sitemap: A human-readable page that lists all the important pages on your website. Located at yoursite.com/sitemap or /sitemap.html. Serves as a navigation aid for visitors who can't find what they're looking for through the main navigation.
Both types are called "sitemaps" but serve fundamentally different audiences. The XML sitemap is for Google's crawlers. The HTML sitemap is for human visitors. This guide focuses primarily on XML sitemaps since they're the more SEO-relevant type.
Why XML Sitemaps Matter for SEO
A common misconception: "My site is already indexed by Google, so I don't need a sitemap." This conflates current indexation with ongoing indexation efficiency.
XML sitemaps benefit SEO in several specific ways:
Help Google Discover New Pages Faster
When you publish a new page, Google discovers it by: following internal links from other indexed pages, or finding it in your sitemap. Without internal links to the new page from existing indexed content, Google might not discover it for weeks. With a sitemap that includes the new URL and a recent "lastmod" date, Google is more likely to crawl it promptly when Googlebot next visits your sitemap.
Critical for Large or Complex Sites
For sites with hundreds or thousands of pages, the sitemap tells Google about all your pages in one place rather than requiring Googlebot to discover them by following every internal link. This is particularly important for e-commerce sites with large product catalogs, news sites with thousands of articles, and documentation sites with extensive content libraries.
Communicate Content Freshness
The <lastmod> element in an XML sitemap tells Google when each URL was last significantly updated. For content sites that regularly update existing articles, this signal helps Google prioritize recrawling recently updated content — which supports content freshness ranking signals.
Doesn't Hurt — Always Worth Having
Even for small sites where Google has already discovered all the pages through normal crawling, having a properly configured sitemap provides benefits at no cost. There's no downside to having an accurate sitemap.
What a Basic XML Sitemap Looks Like
An XML sitemap is a structured file that follows the Sitemap Protocol specification:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://scalify.ai/</loc>
<lastmod>2026-03-15</lastmod>
<changefreq>weekly</changefreq>
<priority>1.0</priority>
</url>
<url>
<loc>https://scalify.ai/services/</loc>
<lastmod>2026-02-28</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
</urlset>
The elements:
<loc>: Required. The full URL of the page, including the protocol (https://). Must match the canonical URL exactly.
<lastmod>: Optional but recommended. The date the URL was last significantly modified, in ISO 8601 format (YYYY-MM-DD). Only update this when content actually changes — artificially inflating lastmod dates to appear fresh is counterproductive and may be penalized.
<changefreq>: Optional. A hint about how often the page changes — always, hourly, daily, weekly, monthly, yearly, never. Google may not follow this strictly — it's treated as a hint, not a directive.
<priority>: Optional. A value from 0.0 to 1.0 indicating relative priority within your site. Google generally ignores this. Setting everything to 1.0 provides no useful signal. This element provides minimal practical value in modern sitemap implementations.
Sitemap Index Files
A single XML sitemap file can contain up to 50,000 URLs and be up to 50MB uncompressed. For sites that exceed these limits, a sitemap index file references multiple individual sitemap files:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://scalify.ai/sitemap-pages.xml</loc>
<lastmod>2026-03-15</lastmod>
</sitemap>
<sitemap>
<loc>https://scalify.ai/sitemap-posts.xml</loc>
<lastmod>2026-03-15</lastmod>
</sitemap>
</sitemapindex>
Large e-commerce sites commonly use sitemap index files to organize product sitemaps by category, blog sitemaps separate from product sitemaps, and image/video sitemaps.
How to Create an XML Sitemap
Platform-Generated Sitemaps (Most Common)
Most modern website platforms and CMS systems generate sitemaps automatically:
Webflow: Generates a sitemap automatically at yoursite.com/sitemap.xml. Includes all published pages. The sitemap updates automatically when pages are added or removed. No configuration required.
Shopify: Automatically generates a sitemap at yourshop.com/sitemap.xml. Includes product pages, collection pages, blog posts, and standard pages.
WordPress with Yoast SEO: Yoast SEO generates an XML sitemap automatically and provides controls for which post types, taxonomies, and individual pages to include or exclude.
WordPress with Rank Math: Similar automatic sitemap generation with detailed configuration options.
Squarespace: Generates a sitemap automatically at yoursite.com/sitemap.xml.
For sites on these platforms: verify the sitemap exists by navigating to /sitemap.xml and confirming it returns a valid sitemap. Then submit it to Google Search Console.
Manually Creating Sitemaps
For custom-built sites without a CMS, sitemaps can be:
Generated by a script: If the site is built with a framework (Next.js, Gatsby, etc.), the framework typically has a sitemap generation plugin or built-in support. Next.js has next-sitemap; Gatsby has gatsby-plugin-sitemap.
Generated by a tool: For static sites, online tools like XML-Sitemaps.com can crawl your site and generate a sitemap file. Free tools work for small sites; larger sites need paid tiers or script-based generation.
Written manually: For very small sites (under 20 pages), writing the sitemap XML manually is feasible. Create a file named sitemap.xml with the XML structure shown above, and upload it to your site's root directory.
Submitting Your Sitemap to Google
Creating a sitemap provides no benefit until Google knows about it. Submission through Google Search Console is the standard method:
1. Log into Google Search Console for your property
2. Navigate to Indexing → Sitemaps
3. Enter your sitemap URL in the "Add a new sitemap" field: sitemap.xml (just the path, not the full URL)
4. Click Submit
Google will attempt to fetch and process the sitemap. The Sitemaps report shows: submission date, last read date, status (Success, Couldn't fetch, or error details), and the number of URLs Submitted vs. Indexed.
The gap between Submitted and Indexed is important: if you have 200 URLs submitted but only 80 indexed, 120 URLs are excluded from Google's index for some reason. Investigating those exclusions in the Coverage/Pages report reveals why — noindex tags, thin content, crawl errors, etc.
Sitemaps can also be referenced in robots.txt (adds another discovery path) and submitted to Bing through Bing Webmaster Tools.
What Should (and Shouldn't) Be in Your Sitemap
Include:
- All pages you want Google to index and rank
- All canonical URLs — the preferred version of each URL
- Pages with unique, valuable content
- Product pages, service pages, blog posts, landing pages
Exclude:
- Noindexed pages (including pages in your sitemap while also noindexing them sends contradictory signals)
- Redirecting URLs (301 or 302 redirects)
- Pages returning 4xx or 5xx errors
- Duplicate content pages (only the canonical URL)
- Pagination pages (generally — unless you want Google to crawl all paginated content)
- Thin or low-value pages you wouldn't want ranking in Google
- Admin, login, and user-specific pages
- Thank-you pages, confirmation pages, and other conversion endpoints
- Development or staging URLs
A sitemap that contains only URLs you genuinely want indexed is more useful than one containing every URL on the site. Including low-quality URLs signals to Google that these are pages worth crawling; if they're pages you'd prefer Google didn't focus on, leave them out.
Common Sitemap Mistakes
Never submitting it: The most common mistake. A sitemap sitting at /sitemap.xml that's never been submitted to Search Console provides minimal benefit — Google may discover it through robots.txt or general crawling, but submission is the reliable path.
Including noindexed URLs: A URL listed in the sitemap with a noindex meta tag on the page sends contradictory signals. Google will note the inconsistency and typically honor the noindex (not indexing the page) — but including it at all wastes crawl budget and signal clarity.
Including redirect URLs: Listing redirect URLs in the sitemap sends Google to a redirect chain rather than the canonical destination. Only canonical, final destination URLs belong in the sitemap.
Incorrect or outdated lastmod dates: Updating lastmod to today's date on every sitemap regeneration — regardless of whether content actually changed — dilutes the freshness signal. Only update lastmod when content actually changes meaningfully.
Not updating the sitemap after content changes: A sitemap that doesn't include recently published pages or doesn't reflect removed pages is outdated. CMS-generated sitemaps are typically dynamic and update automatically; statically generated or manually maintained sitemaps need regular updates.
The HTML Sitemap: For Visitors, Not Crawlers
The HTML sitemap is a supplementary navigation page — typically a simple list of links to every important page on the site, organized logically. It serves visitors who can't find what they're looking for through the main navigation, particularly on large content sites where the main navigation represents only a fraction of available content.
HTML sitemaps are less critical than they once were (when search engine crawling depended more heavily on link discovery). For most modern websites with good internal linking, the HTML sitemap is a nice-to-have rather than a necessity. For large sites, it serves a genuine user need.
Implementation: a simple page with a well-organized hierarchy of links. Doesn't need to be comprehensive — include every important section and key page, but doesn't need to include every individual blog post or product if there are hundreds.
The Bottom Line
An XML sitemap is a simple, important technical SEO foundation — a file that helps Google discover, understand, and prioritize crawling your site's content. Create it (most platforms do this automatically), submit it to Google Search Console, ensure it includes only canonical URLs you want indexed, and update it when your site's content changes significantly.
The sitemap alone won't rank your site — content quality, technical performance, and links do that. But a properly configured sitemap ensures Google can efficiently find and consider all your content, removing a simple technical barrier that could otherwise limit indexation.
Every website Scalify launches includes a properly configured sitemap — submitted to Google Search Console as part of the launch process so crawling and indexation begins immediately rather than waiting for Google to discover the site organically.






