|
From: | Antonio Diaz Diaz |
Subject: | Re: plzip: manual gives very false numbers, real defaults are huge! |
Date: | Wed, 08 May 2024 17:38:34 +0200 |
User-agent: | Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.9.1.19) Gecko/20110420 SeaMonkey/2.0.14 |
Hi Steffen, Steffen Nurpmeso wrote:
#?0|kent:plzip-1.11$ cp /x/balls/gcc-13.2.0.tar.xz X1 #?0|kent:plzip-1.11$ cp X1 X2 [...] -rw-r----- 1 steffen steffen 89049959 May 7 22:14 X1.lz -rw-r----- 1 steffen steffen 89079463 May 7 22:14 X2.lz
Note that if you use uncompressible files as input, you'll always obtain similar compressed sizes, no matter the compression level or the dictionary size. Try the test with gcc-13.2.0.tar and you'll see the difference. (As in your other test with /x/doc/coding/austin-group/202x_d4.txt).
I think dynamically scalling according to the processors, talking into account the dictionary size, as you said above, is the sane approach for "saturating" with plzip, in the above job there are quite a lot of files, of varying size (the spam DB being very large), and one recipe is not good for them all.
Maybe there is a better way (almost optimal for many files) to compress the spam DB that does not require a parallel compressor, but uses all the processors in your machine. (And, as a bonus, achieves maximum compression on files of any size and produces reproducible files).
ls | xargs -n1 -P4 lzip -9 The command above should produce better results than a saturated plzip.'ls' may be replaced by any way to generate a list of the files to be compressed. See http://www.gnu.org/software/findutils/manual/html_node/find_html/xargs-options.html
Hope this helps, Antonio.
[Prev in Thread] | Current Thread | [Next in Thread] |