wget-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: wget2 | no-parent option not working as expected (#620)


From: @rockdaboot
Subject: Re: wget2 | no-parent option not working as expected (#620)
Date: Sun, 11 Dec 2022 19:00:44 +0000



Tim Rühsen commented:


This seem to be two things.

>Documentation: -np, --no-parent Do not ever ascend to the parent directory 
>when retrieving recursively. This is a useful option, since it guarantees that 
>only the files below a certain hierarchy will be downloaded."

Wget2 (as wget) needs to download and scan HTML and CSS files even outside the 
parent directory. Wget2 will only parse them (in memory), but won't save them 
to disk. Wget will store them temporarily to disk, then parse and remove them 
(that's less CPU and I/O efficient).

The same is also true for `--reject-regex` - the matching files will be 
downloaded if HTML/CSS but will be excluded from saving.


The second issue seems to be excessive memory usage ? Well, if there are *many* 
URLs found, they have to be queued in memory for downloading and later marked 
as "has been downloaded".
The `Todo` number in your screenshot indicates that there are quite a lot of 
files to be downloaded - just to store this list, you possibly need some GB of 
RAM !?

But yeah, let me check this (hopefully next weekend, I currently have no time 
during the week).

-- 
Reply to this email directly or view it on GitLab: 
https://gitlab.com/gnuwget/wget2/-/issues/620#note_1204427085
You're receiving this email because of your account on gitlab.com.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]