Fetching a web page with Delphi

This function fetches the HTML content of a given web page. It takes the page's URL as parameter and returns the corresponding HTML text. The name CURL comes from the PHP Client URL Library that can be used (among other things) for the same purpose.
.................
implementation

uses
  IdHTTP;

function Curl(aURL: string): string;
const
  cUSER_AGENT = 'Mozilla/4.0 (MSIE 6.0; Windows NT 5.1)';
var
  IdHTTP: TIdHTTP;
  Stream: TStringStream;
begin
  Result := '';
  IdHTTP := TIdHTTP.Create(nil);
  Stream := TStringStream.Create;
  try
    IdHTTP.Request.UserAgent := cUSER_AGENT;
    try
      IdHTTP.Get(aURL, Stream);
      Result := Stream.DataString;
    except
      Result := '';
    end;
  finally
    Stream.Free;
    IdHTTP.Free;
  end;
end;
.................

You can modify this routine to have the web page saved to a file instead. For that, you only need to use the TStringStream.SaveToFile method in substitution of TStringStream.DataString.

One final observation: you may change the cUSER_AGENT constant to whatever value you decide. If you don’t specify a user agent, then a default value will be provided.

Ah! Don’t forget to add IdHTTP to the uses clause!

2 comments:

  1. I do not recommend using a TStringStream like this, especially in Delphi 2009+.

    Passing a TStream to TIdHTTP.Get() will download raw bytes into the stream, and then the DataString property will parse those bytes into a String based on the TEncoding passed to its constructor, not the charset that the server actually uses for the bytes. If you specify the wrong TEncoding up front, the bytes will not parse correctly.

    TIdHTTP.Get() has an overload that returns a String, use that instead. Let Get() parse the bytes for you, using the charset that the server actually uses.

    Result := IdHTTP.Get(aURL);

    ReplyDelete