Sitemap

So a few days ago, I was playing with my robots.txt and started to do some research. While reading the Wikipedia entry, I noticed that I could provide a Sitemap, which apparently Google, Yahoo!, and MSN would read. A sitemap is basically a list of all your pages so search engine bots don’t have to slowly crawl to find every page. For some reason, both Yahoo! and MSN/Live has problems indexing my HD-Trailers.net site, so I thought maybe a sitemap would help. Google also has a problem indexing my Gallery as it’s slowly increases about 100 pages a week, while still missing 2000+.

So a quick search revealed that both WordPress and Gallery had automatic sitemap generators:

Installing the plug-in/module was rather simple and enabling either was just a few clicks. After the sitemaps were generated, I used the Google Sitemap Validator to see if there were any problems. Apparently the WordPress plug-in issues a priority of 1 instead of 1.0 which the validator didn’t like. I began looking at the code to see where I could fix it, but it seemed lke more hassle as they had some weird calculation converting ints to strings and vice versa. I ended up just setting the homepage to 0.9 in the control, thinking 0.9 isn’t that much different than 1.0.

Now I had to create a site index for my main HD-Trailers.net page. The protocol documentations were pretty helpful and given that I already had 3 sitemaps as reference guides, I whipped up some code to create the sitemap for the main page.

Reading on, it turns out that robots.txt has to be in the root directory and it only supports 1 sitemap per robots.txt. So given that there’s both a blog sitemap and the main page sitemap, I needed to merge the sitemaps into one, which wasn’t too difficult of a task.

However, I found out later that there’s also this sitemap index format which I could’ve used to point to multiple sitemaps instead of merging them. Maybe I’ll change that later. For now, it should do its job fine.

After your sitemaps are ready, you can submit them to Google and Yahoo!. I couldn’t find one for MSN/Live, but maybe they’ll be able to pick it up from my robots.txt.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.