February 2011

Sitemaps let search engines to know more about the structure of your website. Blogs, like any other websites, can improve their visibility on the Internet, by adding their XML sitemaps to the major search engines (Google, Yahoo and Bing).

The problem with Blogger is that you can’t upload your own sitemap or any other file to your blogspot sub-domain root (ex. http://sub-domain.blogspot.com/) or custom domain root (ex. http://www.yanniel.info/).

No panic! There’s a workaround for this: luckily for bloggers, sitemaps can be generated as feeds; meaning that you can actually submit an RSS or Atom feed as a valid sitemap.

Blogger supports both RSS and Atom formats. Anyway, I advice you use the Atom feed URL, because I have had problems when submitting the RSS URL. What problems? Well, I don’t recall now, but I’m pretty sure I had problems ;-)

Here is the Atom sitemap URL for Blogger (it does work for both blogspot sub-domain and custom domains):

http://sub-domain.blogspot.com/atom.xml?redirect=false&start-index=1&max-results=500

http://www. customdomain/atom.xml?redirect=false&start-index=1&max-results=500

Basically, the important part is:

atom.xml?redirect=false&start-index=1&max-results=500

A brief explanation:

atom.xml express the fact that you are requesting an XML Atom feed.
redirect=false prevents Blogger from redirecting your sitemap to a third-party sitemap burner. This is very useful if you are using FeedBurner, because FeedBurner feeds are not recognized as valid sitemaps in most cases.
start-index=1 indicates that you want to syndicate starting on your first post. If you chose 10, for instance, your sitemap will start at post number 10.
max-results=500 tells Blogger to include 500 posts in your sitemap. You can change this number as well, but the maximum count of posts syndicated in your feed will never exceed 500.

So, what happens if my blog has more than 500 posts? Well, you simply add a second sitemap, a third and so on. See the URLs:

http://sub-domain.blogspot.com/atom.xml?redirect=false&start-index=1&max-results=500

http://sub-domain.blogspot.com/atom.xml?redirect=false&start-index=501&max-results=500

http://sub-domain.blogspot.com/atom.xml?redirect=false&start-index=1001&max-results=500

................................................

There’s only one thing pending: you need to add your sitemaps to the major search engines…

For Google, you use Google WebMaster Tools (signing in is required)
For Yahoo, you use Yahoo! Site Explorer (signing in is required)
For Bing, you use Bing WebMaster Tools (signing in is required)

Submitting your sitemaps for each search engine is a slightly different process. I might cover those SEO topics in future posts.

This function fetches the HTML content of a given web page. It takes the page's URL as parameter and returns the corresponding HTML text. The name CURL comes from the PHP Client URL Library that can be used (among other things) for the same purpose.

.................

implementation



uses

  IdHTTP;



function Curl(aURL: string): string;

const

  cUSER_AGENT = 'Mozilla/4.0 (MSIE 6.0; Windows NT 5.1)';

var

  IdHTTP: TIdHTTP;

  Stream: TStringStream;

begin

  Result := '';

  IdHTTP := TIdHTTP.Create(nil);

  Stream := TStringStream.Create;

  try

    IdHTTP.Request.UserAgent := cUSER_AGENT;

    try

      IdHTTP.Get(aURL, Stream);

      Result := Stream.DataString;

    except

      Result := '';

    end;

  finally

    Stream.Free;

    IdHTTP.Free;

  end;

end;

.................

You can modify this routine to have the web page saved to a file instead. For that, you only need to use the TStringStream.SaveToFile method in substitution of TStringStream.DataString.

One final observation: you may change the cUSER_AGENT constant to whatever value you decide. If you don’t specify a user agent, then a default value will be provided.

Ah! Don’t forget to add IdHTTP to the uses clause!

Yanniel's notes

Pages

Submit Blogger sitemap to Google, Yahoo and Bing

Fetching a web page with Delphi