[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: plzip: manual gives very false numbers, real defaults are huge!
From: |
Antonio Diaz Diaz |
Subject: |
Re: plzip: manual gives very false numbers, real defaults are huge! |
Date: |
Tue, 07 May 2024 18:00:43 +0200 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.9.1.19) Gecko/20110420 SeaMonkey/2.0.14 |
Steffen Nurpmeso wrote:
The above means "on files of only a few MB plzip can't be faster than lzip,
no matter what options you use". Of course, at high compression levels the
"few MB" become "several tens of MB".
I think i now have understood your approach.
But i claim it is not what people would expect.
People tends to expect contradictory things. Like using all the processors
on any machine while at the same time producing the same compressed output
on all machines.
For example, if i hack just a little bit i get on my i5 laptop
#?0|kent:plzip-1.11$ time ./plzip -9 -n4 x1
instat=0x7fff32eb6800 inreg=1 sat=0 cf=680412 x=170103 tmp=67108864
USING 67108864
real 0m37.743s
user 0m37.737s
sys 0m0.273s
[...]
#?0|kent:plzip-1.11$ time ./plzip -9 -n0 x1
instat=0x7ffe538049d0 inreg=1 sat=1 cf=680412 x=170103 tmp=67108864
USING 170103
real 0m3.157s
user 0m12.415s
sys 0m0.087s
Note that the above does not run 12 times faster because you have 12
processors, but because you are using a dictionary size almost 200 times
smaller (which I guess will give a compression ratio between levels 0 and 1
instead of the level 9 requested).
I realized for the first time that standard input is treated in
a different way via the "one_to_one" mapping of yours. Ie while
doing
time ./plzip -9 -n0 -c < /tmp/t.tar.xz > x1.lz
it occurred to me that the "struct stat" is not used at all for
stdin, which is a pity imho, especially since S_ISREG() is tested.
S_ISREG is not tested for stdin. But if you want to reproduce the metadata
of the input file in an output file with a different name, you can use
time ./plzip -9 /tmp/t.tar.xz -o x1.lz
That is true. If not a regular file, then the above saturation
will unfortunately not work out. Yet, i thought, limiting a
data size that the user did not explicitly set in the user
required saturation mode could at least minimize the damage a bit:
Or can cause the opposite damage by splitting a huge file into twice as many
members as now.
And i hope the people of reproducible-builds.org now always check their
environment before penaltizing aka flagging other people's work.
Reproducible builds are a set of software development practices that create
an independently-verifiable path from source to binary code. They have
nothing to do with reproducible compression. (Obtaining identical compressed
output from identical uncompressed input read from anywhere).
i find myself using
ZEXE='plzip -9 -B16MiB -n'"$NPROC"' -c' ZEXT=lz
for this to not end up taking dozens of minutes.
I already gave you a solution; use -m or -s:
ZEXE='plzip -m273 -n'"$NPROC"' -c' ZEXT=lz
or
ZEXE='plzip -9 -s8MiB -n'"$NPROC"' -c' ZEXT=lz
The above would at least half the necessary time.
Sure. The above is old and maybe totally useless when using
things like -k and -f. Hm.
I guess you could simplify it to something like this
ZEXE='plzip -m273 -n"$NPROC"'
$ZEXE -kf FILE || exit 5
Best regards,
Antonio.
- plzip: manual gives very false numbers, real defaults are huge!, Steffen Nurpmeso, 2024/05/04
- Re: plzip: manual gives very false numbers, real defaults are huge!, Antonio Diaz Diaz, 2024/05/04
- Re: plzip: manual gives very false numbers, real defaults are huge!, Steffen Nurpmeso, 2024/05/04
- Re: plzip: manual gives very false numbers, real defaults are huge!, Steffen Nurpmeso, 2024/05/04
- Re: plzip: manual gives very false numbers, real defaults are huge!, Antonio Diaz Diaz, 2024/05/06
- Re: plzip: manual gives very false numbers, real defaults are huge!, Steffen Nurpmeso, 2024/05/06
- Re: plzip: manual gives very false numbers, real defaults are huge!,
Antonio Diaz Diaz <=
- Re: plzip: manual gives very false numbers, real defaults are huge!, Steffen Nurpmeso, 2024/05/07
- Re: plzip: manual gives very false numbers, real defaults are huge!, Antonio Diaz Diaz, 2024/05/08
- Re: plzip: manual gives very false numbers, real defaults are huge!, Steffen Nurpmeso, 2024/05/08
- Re: plzip: manual gives very false numbers, real defaults are huge!, Steffen Nurpmeso, 2024/05/07
- Re: plzip: manual gives very false numbers, real defaults are huge!, Steffen Nurpmeso, 2024/05/07
- Re: plzip: manual gives very false numbers, real defaults are huge!, Antonio Diaz Diaz, 2024/05/08
- Re: plzip: manual gives very false numbers, real defaults are huge!, Steffen Nurpmeso, 2024/05/08
- Re: plzip: manual gives very false numbers, real defaults are huge!, Steffen Nurpmeso, 2024/05/07