A lot of websites have a dedicated web page that contains links to all of the sites internal pages. This is primarily used to assist the user in navigating around a website and can be very useful, especially on large websites containing hundreds of pages. Additionally this type of Sitemap provides additional navigation links for search engines, which will help them to find more of your sites internal pages.
However in 2005 Google announced an ‘approved’ XML Sitemap format as the preferred method for indexing websites. This was quickly adopted by Yahoo and MSN. An XML Sitemap is actually a source file which is uploaded to your server. This file will be found by the search engines and used as their preferred navigation route, when indexing your web pages.
I review many websites where a number of internal pages have not been indexed by Google and installing an XML Sitemap will help the search engines to find these non indexed web pages.
Until now, there has been very little research to determine how effective XML Sitemaps really are. However, Google have recently published a detailed XML Sitemaps study, which provides us with an insight into their effectiveness.
The purpose of the Google study was to measure XML Sitemap usage over the last few years in order to determine how Sitemap files have improved the following:
- Coverage – how effective are Sitemaps at assisting Google to crawl the web deeper and find new content that it might not have found previously
- Freshness – do Sitemaps assist Google to crawl new or updated content faster, when compared to the normal crawl.
The study was based on three large websites, Amazon, CNN, and Pubmed.
Amazon’s sitemaps include around 20 million URLs. They also make every effort to indicate the best URL versions, of product pages in their XML sitemap.
CNN,s approach to XML sitemaps focuses upon helping search engines find the many new URLs which are added daily.
Pubmed contains a large archive of URLs listed in their XML sitemaps, however these are only updated on a monthly basis.
This is quite a detailed study with the final paper being 10 pages long. It is a great read if you are interested in understanding how Google Sitemaps work and how they can benefit your website. Download a copy here.
Some interesting facts from the study:
- Approximately 35 million Sitemaps were published, as of October 2008.
- The 35 million Sitemaps include several billion URLs.
- Most popular Sitemap formats include XML (77%), Unknown (17.5%), URL list (3.5%), Atom (1.6%) and RSS (0.11%).
- 58% of URLs in Sitemaps contain the last modification date.
- 7% of URLs contain the change frequency field.
- 61% of URLs contain the priority field.
If you are not using XML Sitemaps on your website, then this study highlights the need for you to consider adding them.