|
From: | Gijs van Tulder |
Subject: | [Bug-wget] WARC, new version |
Date: | Sat, 22 Oct 2011 00:00:42 +0200 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:7.0) Gecko/20110923 Thunderbird/7.0 |
Hi all,Based on the comments by Giuseppe and Ángel I've revised the implementation of the wget WARC extenstion. I've attached a patch.
1. It's no longer based on the warctools library. Instead, I've written a couple of new WARC-writing functions, using zlib for the gzip compression. The new implementation is much smaller.
2. I extracted a small part of the gethttp method in http.c and moved it to a new function, read_response_body, which is responsible for downloading the response body and writing it to a file.
The WARC extension needs to save the response in multiple cases: when the response is successful, but also when the response is a redirect, 401 unauthorized or an error. Moving the response-saving to a separate method makes it possible to reuse this part for all four situations.
Any thoughts? Thanks, Gijs
wget-warc-patch-20111021.patch
Description: Text Data
[Prev in Thread] | Current Thread | [Next in Thread] |