[Bug-wget] Re: How to ignore errors with time stamping

bug-wget

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] Re: How to ignore errors with time stamping

From:	Morten Lemvigh
Subject:	[Bug-wget] Re: How to ignore errors with time stamping
Date:	Fri, 12 Dec 2008 09:03:58 +0100
User-agent:	Thunderbird 2.0.0.18 (X11/20081125)

Andre Majorel wrote:

On 2008-12-11 09:17 +0100, Morten Lemvigh wrote:
I'm having a problem retrieving a page, when I use the time
stamping option.

When I run wget with:
wget -N 'http://eur-lex.europa.eu/JOHtml.do?uri=OJ:C:2007:306:SOM:EN:HTML'

the file is downloaded, but I get the message:
"Last-modified header missing -- time-stamps turned off."
If I run the command a second time, I get an "ERROR 500: Internal ServerError." and wget exits. If I leave the time stamping option out, thedocument is retrieved again.
Is there a way to make wget ignore missing Last-modified headers, andjust retrieve the document?
I believe it's what it does by default. Wget only checks for the
Last-modified header here because you told it to (-N).

Some of the documents on the site, is send with a Last-modified header,and I don't wont to retrieve those if I already got them, hence the -N.But on some documents the header is missing, and in that situation wgetdoesn't do anything with the page, it just continues with the next page.I would like wget to continue looking at the links on that page.

When mirroring a site wget will stop and not  follow any links
on a page, which doesn't send a Last-modified header.


Do you have a log showing that behaviour ? Recursive retrieval of
sites that don't return Last-modified works for me.

No links on a page with a missing last-modified header are scanned, ifthe page is on the disk already. If I run:


wget -r -N http://eur-lex.europa.eu/JOHtml.do?uri=OJ:L:2008:321:SOM:DA:HTML

--08:51:24--http://eur-lex.europa.eu/JOHtml.do?uri=OJ:L:2008:321:SOM:DA:HTML

           => `eur-lex.europa.eu/JOHtml.do?uri=OJ:L:2008:321:SOM:DA:HTML'

Resolving eur-lex.europa.eu... 147.67.136.2, 147.67.136.102,147.67.119.2, ...

Connecting to eur-lex.europa.eu|147.67.136.2|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9.709 (9.5K) [text/html]

100%[=====================================================================================================================================>]9.709 --.--K/s


Last-modified header missing -- time-stamps turned off.

08:51:24 (82.42 KB/s) -`eur-lex.europa.eu/JOHtml.do?uri=OJ:L:2008:321:SOM:DA:HTML' saved[9709/9709]

[....]

wget will retrieve the page and continue recursively getting all thelinked pages, as I would expect. If I issue this command a second time,all I get is this:


wget -r -N http://eur-lex.europa.eu/JOHtml.do?uri=OJ:L:2008:321:SOM:DA:HTML

--08:53:18--http://eur-lex.europa.eu/JOHtml.do?uri=OJ:L:2008:321:SOM:DA:HTML

           => `eur-lex.europa.eu/JOHtml.do?uri=OJ:L:2008:321:SOM:DA:HTML'

Resolving eur-lex.europa.eu... 147.67.119.2, 147.67.119.102,147.67.136.2, ...

Connecting to eur-lex.europa.eu|147.67.119.2|:80... connected.
HTTP request sent, awaiting response... 500 Internal Server Error
08:53:18 ERROR 500: Internal Server Error.


FINISHED --08:53:18--
Downloaded: 0 bytes in 0 files

So all the pages linked from this page are ignored to. It's fine if wgetskips the problematic document, but I would prefer wget to continue therecursive scan.

[Prev in Thread]

Current Thread

[Next in Thread]

[Bug-wget] How to ignore errors with time stamping, Morten Lemvigh, 2008/12/11
- Re: [Bug-wget] How to ignore errors with time stamping, Andre Majorel, 2008/12/11
  - [Bug-wget] Re: How to ignore errors with time stamping, Morten Lemvigh <=
    - Re: [Bug-wget] Re: How to ignore errors with time stamping, Andre Majorel, 2008/12/12
    - [Bug-wget] Re: How to ignore errors with time stamping, Morten Lemvigh, 2008/12/12
    - Re: [Bug-wget] Re: How to ignore errors with time stamping, Andre Majorel, 2008/12/12

Prev by Date: [Bug-wget] Problems with wget using -r and -O options
Next by Date: Re: [Bug-wget] Re: How to ignore errors with time stamping
Previous by thread: Re: [Bug-wget] How to ignore errors with time stamping
Next by thread: Re: [Bug-wget] Re: How to ignore errors with time stamping
Index(es):
- Date
- Thread