bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] How to ignore link like "index.html?lang=ja"?


From: Micah Cowan
Subject: Re: [Bug-wget] How to ignore link like "index.html?lang=ja"?
Date: Sun, 06 Jun 2010 13:54:28 -0700
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9) Gecko/20100423 Thunderbird/3.0.4

On 06/06/2010 01:46 PM, Guillaume Turri wrote:
> Tony Lewis a écrit :
>> Guillaume Turri wrote:
>>
>>  
>>> In fact, why is this option treated after a download?
>>>     
>>
>> When mirroring, all HTML files have to be downloaded (whether or not
>> it is
>> desired to ultimately keep the HTML file) in order to find all the
>> interesting file. For example:
>>
>> wget http://www.somesite.com/index.html --mirror --accept=pdf
> Indeed. I didn't realise it could be used that way.
> 
> Thank you for this explanation.

Yeah, that was the original thinking. But I still hate it. For one
thing, there are no longer any guarantees that recurse-able HTML files
end in ".html"; for another, it does the wrong thing if you want to do
-r -l1 -A.pdf (just grab all the pdf links from the given page. It's
better to let you explicitly specifiy what files to download, and a
separately specified set of files to be deleted afterwards (or more
accurately, files to download only for parsing/recursion purposes, as at
some point in the future we might not actually download all files
directly to disk just in order to parse them).

-- 
Micah J. Cowan
http://micah.cowan.name/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]