lzip-bug
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: plzip: manual gives very false numbers, real defaults are huge!


From: Steffen Nurpmeso
Subject: Re: plzip: manual gives very false numbers, real defaults are huge!
Date: Sun, 05 May 2024 02:18:28 +0200
User-agent: s-nail v14.9.24-621-g0d1e55f367

Hello Antonio.

Thanks for the quick response on a Saturday.

Antonio Diaz Diaz wrote in
 <6636B24B.4030902@gnu.org>:
 |Steffen Nurpmeso wrote:
 |> plzip: manual gives very false numbers, real defaults are huge!
 |
 |But if you pass options to plzip (-9 -n4) then you are no longer using the 
 |defaults. ;-)

mumble mumble

 |> but while compressing a 70MB file i realized it was not multithreaded.
 |
 |At compression level 9 and 4 threads you need at least a 256 MiB file. See 

I mean, that is a 70MB file!!

 |http://www.nongnu.org/lzip/manual/plzip_manual.html#Minimum-file-sizes

Thank you, but please no HTML pages unless absolutely necessary.
But yes it is true, CRUX Linux does not install any info pages by
policy, so i have never seen the above.
I honestly feel that approach is too scientific for me.

 |> (-n4 i never did; the manual however says two is default, which is
 |> not true, mind you)
 |
 |The man page (plzip.1) is a short reference created automatically with 
 |help2man from 'plzip --help', and reports the number of processors in my 
 |machine. You need to run 'plzip --help' yourself to see the number of 

I see.  (I did, *that* is four.  To my surprise, coming from the
manual.)

 |processors in your machine. The real manual can be accessed with 'info 
 |plzip' or at http://www.nongnu.org/lzip/manual/plzip_manual.html

No info here by policy.  Btw, so long is the manual not, not even
40KiB.  You need to be a computer scientist to get at least the
notion from

 Note that the number of usable threads is limited by file size;
 on files larger than  a  few  GB plzip  can  use  hundreds  of
 processors, but on files of only a few MB plzip is no faster than
 lzip.

Ok "you get scaling effects", but 70 MiB is not "a few MiB".
To me.  67 megabytes per processor!  (How about doing a stat and
somehow taking into account st_size?  Or fstat, after the file was
opened?  Just a suggestion.  But, *here*, the difference for that
70 megabytes are notable.)

 |> But the thing is, if i do
 |>
 |>    plzip -9 -n4 -B16000000 -c < 76-MiB-file > au.lz
 |>
 |> aka use the values the manual describes, then i get 263 and then
 |> 400 until the end, in my poor man's top(1).
 |
 |Correct. The command above first sets a --dictionary-size of 32 MiB and a 
 |--data-size of 64 MiB (with -9), then reduces the --data-size to 16MB \
 |(with -B).

A single sentence that the "defaults" are (of course?!!?!)
dependent on the compression level would have shed some
enlightening.  (I mean, not the most enlightened one, but over 25
years of daily "full-time" software work.  Hm.)

 |> Please let me make the statement that a default of 67 MiB for data
 |> size is really too much,
 |
 |The default (if you don't specify a compression level) is 16 MiB as 
 |documented. If you want a high compression level without altering the 
 |default --dictionary-size and --data-size, you can use
 |
 |   plzip -m273 -n4 -c < 76-MiB-file > au.lz

No, that really not.  -m is something for crypto-nerds and
mathematicians, not something for a normal user.

 |> In *my* opinion, the defaults should satisfy the occasional "i wanna
 |> compress something" lady (Diaz is south of Texas, is it?), instead of
 |> those "i want to compress my 10 GiB scientific database file" specialist\
 |> s.
 |
 |Maybe the occasional "i wanna compress something" lady would be better 
 |served by using plain lzip, which compresses equally well files of \
 |any size.

On her Windows gaming PC?
But i like it, xz here does

  $ xz -9 -T0 -c </tmp/rspamd.tar > au.lz
  xz: Reduced the number of threads from 4 to 1 to not exceed the memory usage 
limit of 1967 MiB
  $ xz -8 -T0 -c </tmp/rspamd.tar > au.lz
  xz: Reduced the number of threads from 4 to 2 to not exceed the memory usage 
limit of 1967 MiB
  $ xz -7 -T0 -c </tmp/rspamd.tar > au.lz
  ^C

only 2 are used.  It is fine with -6.  Just like plzip it then
fully utilizes my four cores, is only 25 percent faster and the
file is only a tiny bit smaller (69438329, 69335324).

I see.  PEBCAK that is called.  Well.

 |About malloc, my plan is to stick to standard usage. I do not plan to play 
 |with it in any way.

Well i am silent now, but shall i find the time to make it i will
just for fun at sometime post a patch that requires no memory
allocations at all.

 |Best regards,

(Having read the referenced section in the .info file in the
source tarball i would open an issue as "wishlist" asking for an
option that would scale-to-"a-reasonable-number-of"-cpus.  I could
imagine it would become a hit.  It is much, much faster, and, as
you say, only slightly larger.)

 |Antonio.
 --End of <6636B24B.4030902@gnu.org>

Thank you.  A nice sunday i wish!
Greeings from Germany,

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]