bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] Invalid Content-Length header in WARC files, on some platform


From: Gijs van Tulder
Subject: [Bug-wget] Invalid Content-Length header in WARC files, on some platforms
Date: Mon, 12 Nov 2012 22:34:23 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121028 Thunderbird/16.0.2

Hi,

There's a somewhat serious issue in the WARC-generating code: on some platforms (presumably the ones where off_t is not a 64-bit number) the Content-Length header at the top of each WARC record has an incorrect length. On these platforms it is sometimes 0, sometimes 1, but never the correct length. This makes the whole WARC file unreadable.

The code works fine on many platforms, but it is apparently a problem on some PowerPC and ARM systems, and maybe other systems as well.

Existing WARC files with this problem can be repaired by replacing the value of the Content-Length header with the correct value, for each WARC record in the file. The content of the WARC records is there, it's just the Content-Length header that is wrong.

The attached patch fixes the problem in warc.c. It replaces off_t by wgint and uses the number_to_static_string function from util.c.

Regards,

Gijs

Attachment: wget-warc-content-length.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]