bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] WARC, new version


From: Gijs van Tulder
Subject: [Bug-wget] WARC, new version
Date: Sat, 22 Oct 2011 00:00:42 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:7.0) Gecko/20110923 Thunderbird/7.0

Hi all,

Based on the comments by Giuseppe and Ángel I've revised the implementation of the wget WARC extenstion. I've attached a patch.

1. It's no longer based on the warctools library. Instead, I've written a couple of new WARC-writing functions, using zlib for the gzip compression. The new implementation is much smaller.

2. I extracted a small part of the gethttp method in http.c and moved it to a new function, read_response_body, which is responsible for downloading the response body and writing it to a file.

The WARC extension needs to save the response in multiple cases: when the response is successful, but also when the response is a redirect, 401 unauthorized or an error. Moving the response-saving to a separate method makes it possible to reuse this part for all four situations.

Any thoughts?

Thanks,

Gijs

Attachment: wget-warc-patch-20111021.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]