info-gnu
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

GNU Wget 1.8 has been released


From: Hrvoje Niksic
Subject: GNU Wget 1.8 has been released
Date: Mon, 10 Dec 2001 09:52:53 +0100
User-agent: Gnus/5.090004 (Oort Gnus v0.04) XEmacs/21.4 (Copyleft)

GNU Wget 1.8 is now available.  Get it from:

    ftp://ftp.gnu.org/pub/gnu/wget/wget-1.8.tar.gz

or from one of the mirror sites; see the list at
http://www.gnu.org/order/ftp.html.

GNU Wget is a free utility for non-interactive download of files from
the Web.  It supports HTTP, HTTPS, and FTP protocols, as well as
retrieval through HTTP proxies.

It can follow links in HTML pages and create local versions of remote
web sites, fully recreating the directory structure of the original
site.  This is sometimes referred to as "recursive downloading."
While doing that, Wget respects the Robot Exclusion Standard
(/robots.txt).  Wget can be instructed to convert the links in
downloaded HTML files to the local files for offline viewing.

Please send bug reports to <address@hidden>.

* Changes in Wget 1.8.

** A new progress indicator is now available and used by default.
You can choose the progress bar type with `--progress=TYPE'.  Two
types are available, "bar" (the new default), and "dot" (the old
dotted indicator).  You can permanently revert to the old progress
indicator by putting `progress = dot' in your `.wgetrc'.

** You can limit the download rate of the retrieval using the
`--limit-rate' option.  For example, `wget --limit-rate=15k URL' will
tell Wget not to download the body of the URL faster than 15 kilobytes
per second.

** Recursive retrieval and link conversion have been revamped:

*** Wget now traverses links breadth-first.  This makes the
calculation of depth much more reliable than before.  Also, recursive
downloads are faster and consume *significantly* less memory than
before.

*** Links are converted only when the entire retrieval is complete.
This is the only safe thing to do, as only then is it known what URLs
have been downloaded.

*** BASE tags are handled correctly when converting links.  Since Wget
already resolves <base href="..."> when resolving handling URLs, link
conversion now makes the BASE tags point to an empty string.

*** HTML anchors are now handled correctly.  Links to an anchor in the
same document (<a href="#anchorname">), which used to confuse Wget,
are now converted correctly.

*** When in page-requisites (-p) mode, no-parent (-np) is ignored when
retrieving for inline images, stylesheets, and other documents needed
to display the page.

*** Page-requisites (-p) mode now works with frames.  In other words,
`wget -p URL-THAT-USES-FRAMES' will now download the frame HTML files,
and all the files that they need to be displayed properly.

** `--base' now works conjunction with `--input-file', providing a
base for each URL and thereby allowing the URLs in the file to be
relative.

** If a host has more than one IP address, Wget uses the other
addresses when accessing the first one fails.

** Host directories now contain port information if the URL is at a
non-standard port.

** Wget now supports the robots.txt directives specified in
<http://info.webcrawler.com/mak/projects/robots/norobots-rfc.html>.

** URL parser has been fixed, especially the infamous overzealous
quoting.  Wget no longer dequotes reserved characters, e.g. `%3F' is
no longer translated to `?', nor `%2B' to `+'.  Unsafe characters
which are not reserved are still escaped, of course.

** No more than 20 successive redirections are allowed.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]