|
From: | Antonio Diaz Diaz |
Subject: | Re: plzip: manual gives very false numbers, real defaults are huge! |
Date: | Mon, 06 May 2024 16:33:17 +0200 |
User-agent: | Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.9.1.19) Gecko/20110420 SeaMonkey/2.0.14 |
Steffen Nurpmeso wrote:
Thanks for the quick response on a Saturday.
You are welcome. :-)
Note that the number of usable threads is limited by file size; on files larger than a few GB plzip can use hundreds of processors, but on files of only a few MB plzip is no faster than lzip. Ok "you get scaling effects", but 70 MiB is not "a few MiB".
The above means "on files of only a few MB plzip can't be faster than lzip, no matter what options you use". Of course, at high compression levels the "few MB" become "several tens of MB".
67 megabytes per processor! (How about doing a stat and somehow taking into account st_size? Or fstat, after the file was opened?
This would break reproducibility (obtaining identical compressed output from identical input) because the size of uncompressed data read from standard input (not from a file) can't be known in advance.
A single sentence that the "defaults" are (of course?!!?!) dependent on the compression level would have shed some enlightening.
I'll try to document it better in the manual and in the man page.
(Having read the referenced section in the .info file in the source tarball i would open an issue as "wishlist" asking for an option that would scale-to-"a-reasonable-number-of"-cpus.
As I said above, such an option would not work with data read from standard input, and would break reproducibility.
Best regards, Antonio.
[Prev in Thread] | Current Thread | [Next in Thread] |