On Mon, 13 Oct 2008, Eric F Crist wrote:

> On Oct 13, 2008, at 12:52 PM, Robert De Mars wrote:
>
>> I was wondering if anyone knew of an open source project that can do 
>> the following.
>>
>> I have an internal web server at work that employees use for various 
>> things.
>>
>> I am looking for a piece of software (or several pieces if needed) that 
>> would crawl various industry related websites, and then save a local 
>> copy of the articles.  I would like the software to collect the 
>> selected content, and when it is done crawling create an index file 
>> where employees can see various industry news on one page.
>
>
> wget and curl can do this for you.  wget is most capable, including 
> editing of paths and such for local viewing.  Pretty common thing to do.
>
> As far as the 'index' page, that's something you'd have to munge 
> together yourself, I think.


I have used wget and another program called HTTrack, but I'm not sure that 
wget can properly rewrite the pages for local viewing.  Has anyone 
compared these two programs?  Like wget, HTTrack is GPL'd and available 
for Linux:

http://www.httrack.com/
http://www.gnu.org/software/wget/

If wget will do it just as well, I guess I'd use wget because it is 
present on more systems and I just use it a lot more often.

Mike