Saturday, November 21, 2009

Getting website Offline though CLI

Mirroring a site was never that easy. No software required just a single line command all is done.Let me explain how i did that.
1. Found a site had good AWK manual.
2. Went through the AWK manual and located two good options to the command that worked just well, though i had few wrong attempts earlier.
- m : Mirrors the web site
[man] Turn on options suitable for mirroring. This option turns on recursion and time-stamping, sets infinite recursion depth and keeps FTP directory listings. It is currently equivalent to -r -N -l inf --no-remove-listing.[/man]

-k : Makes the local copy of site browsable, by making all of the links relative to the local location.
[man]After the download is complete, convert the links in the document to make them suitable for local viewing. This affects not only the visible hyperlinks, but any part of the document that links to external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content, etc.[/man]

-w : Introduces a delay of x secs between each hit to the server and hence, prevents our IP identified as a crawler.
[man]Wait the specified number of seconds between the retrievals. Use of this option is recommended, as it lightens the server load by making the requests less frequent. Instead of in seconds, the time can be specified in minutes using the "m" suffix, in hours using "h" suffix, or in days using "d" suffix.[/man]

So finally the command for mirroring the whole website looks like this.
#wget -mk -w 5

The '5' mentioned here along with -w means the number of seconds to wait.


Post a Comment