[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] --header="Accept-encoding: gzip"
From: |
andreas wpv |
Subject: |
Re: [Bug-wget] --header="Accept-encoding: gzip" |
Date: |
Wed, 23 Sep 2015 21:09:37 -0500 |
Thanks for the insights. and for working on the next version.
andreas
On Wed, Sep 23, 2015 at 3:10 AM, Tim Ruehsen <address@hidden> wrote:
> > wget --user-agent "Mozilla/5.0 (Windows NT x.y; WOW64; rv:10.0)
> > Gecko/20100101 Firefox/10.0" -e robots=off --header="accept-encoding:
> gzip
> > " -p -H "www.google.com"
> >
> > Still only gives me 52 kb! and one file: index.html
> >
> > So, accept encoding seems to work, but only for the main file?
>
> As Ángel said, the main file is gzipped but wget can't parse it.
> That's why you just get one file (index.html). (This file could be named
> index.html.gz to reflect the content.)
> You could manually gzip -d it and feed the resulting HTML file to wget
> manually, like wget -r --force-html --input-file index.html --base
> www.google.com
>
> There have been patches to support gzip encoding, but either they were
> half-
> baken or the authors did not sign the FSF copyright assignment.
>
> *Note*
> [Meanwhile, we are working on wget2. Content encodings like gzip and
> deflate
> are already built in here. Also lzma and bzip2 for even better compression
> (but servers don't support it out-of-the-box yet).]
>
> Regards, Tim
>
>