In this post I am implementing the Delphi way to request the Google Toolbar’s PageRank (PR). As an example, if I want to look up the PR of my blog (http://www.yanniel.info/), I will have to query the following URL:
http://toolbarqueries.google.com/tbr?client=navclient-auto&features=Rank&ch=64012521073&q=info:http://www.yanniel.info/
So, if you try
Writeln(Curl('http://toolbarqueries.google.com/tbr?client=navclient-auto&features=Rank&ch=64012521073&q=info:http://www.yanniel.info/'));
You will get the following output:
Rank_1:1:0
Oops, at this moment my PageRank is pathetic (PR = 0). Hopefully it will get better someday ;-)
You can find the implementation of the Curl function in my previous post: Fetching a web page with Delphi.
Now that you get the idea, it’s time to generalize our implementation in order to get the PR of any web page.
In the URL above there are two dynamic parameters that we need to understand:
Parameter q: This is the concatenation of “info:” plus the encoded URL for which we want to get the PR. It is mandatory to lead the encoded URL with “http://”. Don’t forget to put a “/” after the top domain, ei, “.info/”, not “.info”.
Parameter ch: Obtaining this parameter is the tricky part. We have to hash the concatenation of “info:” plus our target URL (ei. http://www.yanniel.info/) using the minimal perfect hashing algorithm invented by Bob Jenkins. The outcome will be a number, which corresponds to the value of our ch parameter. We are hashing something like “info:http://www.yanniel.info/”.
I didn’t have the time (or the desire) to understand and implement the minimal perfect hashing algorithm from scratch. So, I ported this C# implementation into Delphi code.
There’s one detail worth mentioning about the implementation. In the original C# code there’s a switch statement that falls all the way through the different cases. I needed to combine a case statement and some goto statements in order to accomplish the same in Delphi.
I wrote the strikeout text above initially; but Brian (first comment below) found a clean way to avoid the goto in this situation. I really thank Brian for his contribution, and next time, I won't quit that easily in removing the goto ;-)
I am putting everything together in the Delphi console application below. I bet you can reuse the BobJenkinsHash function in the Generics.Default.pas unit to make the PR implementation more compact. I didn’t know about the BobJenkinsHash function shipped with Delphi before reading Nick’s post; which is why I am keeping my original implementation.
Finally, this is how you request the Google PageRank in Delphi:
program Project1;
{$APPTYPE CONSOLE}
{$R *.res}
uses System.SysUtils, IdHTTP, IdURI, Classes;
type TByteArray = array of Byte;
function Curl(aURL: string): string;const cUSER_AGENT = 'Mozilla/4.0 (MSIE 6.0; Windows NT 5.1)';var IdHTTP: TIdHTTP; Stream: TStringStream;begin Result := ''; IdHTTP := TIdHTTP.Create(nil); Stream := TStringStream.Create; try IdHTTP.Request.UserAgent := cUSER_AGENT; try IdHTTP.Get(aURL, Stream); Result := Stream.DataString; except Result := ''; end; finally Stream.Free; IdHTTP.Free; end;end;
procedure Hashing(var a, b, c: Integer);begin a:= a - b; a:= a - c; a:= a xor (c shr 13); b:= b - c; b:= b - a; b:= b xor (a shl 8); c:= c - a; c:= c - b; c:= c xor (b shr 13); a:= a - b; a:= a - c; a:= a xor (c shr 12); b:= b - c; b:= b - a; b:= b xor (a shl 16); c:= c - a; c:= c - b; c:= c xor (b shr 5); a:= a - b; a:= a - c; a:= a xor (c shr 3); b:= b - c; b:= b - a; b:= b xor (a shl 10); c:= c - a; c:= c - b; c:= c xor (b shr 15);end;
function StrOrd(aValue: string): TByteArray;var I: Integer;begin SetLength(Result, Length(aValue)); for I:= 1 to Length(aValue) do Result[I-1]:= Ord(aValue[I]);end;
function PerfectHash(aURL: string): string;var url: TByteArray; _length, k, len: Integer; a, b, c: Integer;
label One, Two, Three, Four, Five, Six, Seven, Eight, Nine, Ten;
begin url:= StrOrd(aURL); _length:= Length(url); k:= 0; len:= _length; a:= $9E3779B9; b:= $9E3779B9; c:= $E6359A60;
while (len >= 12) do begin a:= a + (url[k + 0] + (url[k + 1] shl 8) + (url[k + 2] shl 16) + (url[k + 3] shl 24) );
b:= b + (url[k + 4] + (url[k + 5] shl 8) + (url[k + 6] shl 16) + (url[k + 7] shl 24) );
c:= c+ (url[k + 8] + (url[k + 9] shl 8) + (url[k + 10] shl 16) + (url[k + 11] shl 24));
Hashing(a, b, c); k:= k + 12; len:= len - 12; end;
c:= c + integer(_length); case len of 11: begin c:= c + (url[k + 10] shl 24); goto Ten; end; 10: begin Ten: c:= c + (url[k + 9] shl 16); goto Nine; end; 9: begin Nine: c:= c + (url[k + 8] shl 8); goto Eight; end; 8: begin Eight: b:= b + (url[k + 7] shl 24); goto Seven; end; 7: begin Seven: b:= b + (url[k + 6] shl 16); goto Six; end; 6: begin Six: b:= b + (url[k + 5] shl 8); goto Five; end; 5: begin Five: b:= b + (url[k + 4]); goto Four; end; 4: begin Four: a:= a + (url[k + 3] shl 24); goto Three; end; 3: begin Three: a:= a + (url[k + 2] shl 16); goto Two; end; 2: begin Two: a:= a + (url[k + 1] shl 8); goto One; end; 1: begin One: a:= a + (url[k + 0]); end; end;
Hashing(a, b, c); Result:= Format('6%0:u', [c]);end;
function GetPageRank(aURL: string): string;const cToolbarQueriesURL = 'http://toolbarqueries.google.com/tbr?client=navclient-auto&features=Rank&ch=%0:s&q=%1:s';var CH, Q: string;begin aURL:= StringReplace(aURL, 'http://', '', []); if Pos('/', aURL) = 0 then aURL:= aURL + '/'; aURL:= 'http://' + aURL;
CH:= PerfectHash('info:' + aURL); Q:= 'info:' + TIdURI.URLEncode(aURL);
Result:= Curl(Format(cToolbarQueriesURL, [CH, Q]));end;
begin Writeln(GetPageRank('http://www.yanniel.info/')); Writeln(GetPageRank('http://www.delphifeeds.com/')); Writeln(GetPageRank('http://www.google.com/')); Readln;end.
Here is how to code withot the GoTo. Replace the PerfectHash function with the one below. Obviously, it could be cleaned up more but the concept works.
ReplyDelete-Brian Marple
function PerfectHash(aURL: string): string;
var
url: TByteArray;
_length,
k, len: Integer;
a, b, c: Integer;
//Use this internal function to avoid GoTo statements
function DoStep(iStep: integer): integer;
begin
case iStep of
11: c:= c + (url[k + 10] shl 24);
10: c:= c + (url[k + 9] shl 16);
9: c:= c + (url[k + 8] shl 8);
8: b:= b + (url[k + 7] shl 24);
7: b:= b + (url[k + 6] shl 16);
6: b:= b + (url[k + 5] shl 8);
5: b:= b + (url[k + 4]);
4: a:= a + (url[k + 3] shl 24);
3: a:= a + (url[k + 2] shl 16);
2: a:= a + (url[k + 1] shl 8);
1: a:= a + (url[k + 0]);
end;
Result := pred(iStep);
end;
begin
url:= StrOrd(aURL);
_length:= Length(url);
k:= 0;
len:= _length;
a:= $9E3779B9;
b:= $9E3779B9;
c:= $E6359A60;
while (len >= 12) do
begin
a:= a + (url[k + 0] + (url[k + 1] shl 8) +
(url[k + 2] shl 16) + (url[k + 3] shl 24) );
b:= b + (url[k + 4] + (url[k + 5] shl 8) +
(url[k + 6] shl 16) + (url[k + 7] shl 24) );
c:= c+ (url[k + 8] + (url[k + 9] shl 8) +
(url[k + 10] shl 16) + (url[k + 11] shl 24));
Hashing(a, b, c);
k:= k + 12;
len:= len - 12;
end;
c:= c + integer(_length);
//use function to avoid GoTo statements
while len > 0 do
len := DoStep( len );
Hashing(a, b, c);
Result:= Format('6%0:u', [c]);
end;
Brian, I cannot thank you enough for sharing this with me. Your implementation is elegant and clean. Once again thank you!
ReplyDelete