I have used wget2 to download 69 to 70 pages from a University
College Campus directory. The process has worked with no
problems for many years and reduced time to about 25 seconds,
But know I get errors if I set it to more than 32 threads.
wget2 --max-threads=32 --secure-protocol=PFS
--base="https://www.uog.edu/" -i testlistuog
works fine
testlistuog contains
directory/?page=01
directory/?page=02
...
...
directory/?page=68
directory/?page=69
Know the wget2 recently was updated in the Fedora 38 repo,
GNU Wget2 2.1.0 - multithreaded metalink/file/website
downloader
+digest +https +ssl/gnutls +ipv6 +iri +large-file +nls -ntlm -opie
+psl -hsts +iconv +idn2 +zlib -lzma +brotlidec +zstd -bzip2 -lzip
+http2 +gpgme
Don't know if that change did something with threads? or perhaps
some other update?
I had found that the windows version of wget2 did not work well
with threads so have it run with threads set to 1.
Time with windows to download is:
Time to Download Campus Directory 154.332887 Seconds
The linux version with 32 threads now takes.
Time to Download Campus Directory 138.430772 Seconds
While previously it was running about 25 seconds with 70 threads?
Origainal lines in program
Call to get page 1 to find total number of pages in directory.
system("wget2 --restrict-file-names=windows --secure-protocol=PFS -q
\"https://www.uog.edu/directory/?page=01\"");
Creates the testlistuog file with ?page=01 to ?page=lastpage number
Call with linux (Runs the wget in backgroud and loop to display with downloads
system("wget2 --restrict-file-names=windows --max-threads=70
--secure-protocol=PFS -q
--base=\"https://www.uog.edu/directory/\" -i testlistuog 2>error & PID=$! ;
printf '[' ; while ps hp $PID
/dev/null ; do printf '▓'; sleep 1 ; done ; printf '] done!\n'");
This produces individual files for each page, and then combines them into one
allraw.uog when done.
With windows it uses single thread and downloads pages 1 to last and sends
output to allraw.uog.
system("wget2 --max-threads=1 --restrict-file-names=windows
--secure-protocol=PFS
--progress=none --base=\"https://www.uog.edu/directory/\" -O \"allraw.uog\" -i
testlistuog");
Run wget2 commands outside cpp program to make sure it wasn't that causing
issue.
Going from 25 seconds to 138 isn't a huge problem, but seeing the change in how
the program is
working is concerning.
Perhaps a change in max number of threads was done, or perhaps some other
update in Fedora or
within kernels? 6.5.5-200.fc38.x86_64
+------------------------------------------------------------+
Michael D. Setzer II - Computer Science Instructor (Retired)
mailto:mikes@guam.net
mailto:msetzerii@gmail.com
Guam - Where America's Day Begins
G4L Disk Imaging Project maintainer
http://sourceforge.net/projects/g4l/
+------------------------------------------------------------+