[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] --header="Accept-encoding: gzip"
From: |
Tim Ruehsen |
Subject: |
Re: [Bug-wget] --header="Accept-encoding: gzip" |
Date: |
Wed, 23 Sep 2015 10:10:12 +0200 |
User-agent: |
KMail/4.14.10 (Linux/4.1.0-2-amd64; KDE/4.14.12; x86_64; ; ) |
> wget --user-agent "Mozilla/5.0 (Windows NT x.y; WOW64; rv:10.0)
> Gecko/20100101 Firefox/10.0" -e robots=off --header="accept-encoding: gzip
> " -p -H "www.google.com"
>
> Still only gives me 52 kb! and one file: index.html
>
> So, accept encoding seems to work, but only for the main file?
As Ángel said, the main file is gzipped but wget can't parse it.
That's why you just get one file (index.html). (This file could be named
index.html.gz to reflect the content.)
You could manually gzip -d it and feed the resulting HTML file to wget
manually, like wget -r --force-html --input-file index.html --base
www.google.com
There have been patches to support gzip encoding, but either they were half-
baken or the authors did not sign the FSF copyright assignment.
*Note*
[Meanwhile, we are working on wget2. Content encodings like gzip and deflate
are already built in here. Also lzma and bzip2 for even better compression
(but servers don't support it out-of-the-box yet).]
Regards, Tim