bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] WARC File Creation - Scope Issues


From: Tim Ruehsen
Subject: Re: [Bug-wget] WARC File Creation - Scope Issues
Date: Fri, 12 Apr 2013 10:32:13 +0200
User-agent: KMail/1.13.7 (Linux/3.2.0-4-amd64; KDE/4.8.4; x86_64; ; )

Hello Mark,

to capture a single document just execute e.g.
wget --warc-file single_page 
'https://webarchive.jira.com/wiki/display/wayback/Wayback+Installation+and+Configuration+Guide#WaybackInstallationandConfigurationGuide-
URLsandWebApplications'

To save a page + requisites (everything you need to display it),
add the -p / --page-requisites option. Consult the man pages for a detailed 
explanation.

Regards, Tim

Am Thursday 11 April 2013 schrieb McFate, Mark:
> This is not a 'bug' by any means, but I could find no better place to post
> this so please forgive me...
> 
> I've used 'wget' for years but am just now discovering the real power it
> has.  Lately I have upgraded to v1.14 so that I can take advantage of WARC
> file creation.  But I need to learn a lot more.  In particular, I'm having
> trouble controlling the scope of the content returned by wget when using
> the -warc-file option (or even when not).  The -mirror option is nice, but
> in many circumstances it returns far too much information, and limiting
> the return using the -l option requires trial and error as I am never sure
> how deep to set it.
> 
> For example, I would like to retrieve the following set of pages as a WARC,
> but don't really want anything else from this domain: 
> https://webarchive.jira.com/wiki/display/wayback/Wayback+Installation+and+
> Configuration+Guide#WaybackInstallationandConfigurationGuide-URLsandWebAppl
> ications.  Is it even possible using wget to capture a complete WARC
> containing only this document?
> 
> So, I'm looking for guidance that might be pertinent to using wget for WARC
> retrieval.  Please point me to anything you think might be helpful. 
> Thanks.
> 
> Mark A. McFate
> Digital Library Applications Developer
> Burling Library, Grinnell College
> Grinnell, IA  50112-1690
> address@hidden<mailto:address@hidden>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]