[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Fast lzma radix matchfinder
From: |
Antonio Diaz Diaz |
Subject: |
Re: Fast lzma radix matchfinder |
Date: |
Tue, 14 Jun 2022 18:23:43 +0200 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.9.1.19) Gecko/20110420 SeaMonkey/2.0.14 |
Adam Tuja wrote:
The comparison here would be the same as with lzma, that is slightly faster. [1]
Bigger advantage, beside compression speed, is revealed in memory consumption
for multiple threads - it's halved for single thread but 1/4 for 2 threads and
1/8 for 4 threads [1][2].
Very interesting. Thank you for bringing this to my attention. I expect to
look at it in depth when I find the time, but I guess it may be difficult
(or impossible) to integrate it meaningfully into plzip because it seems
very different from what plzip does. See for example
https://github.com/conor42/fast-lzma2#readme
"Speed gains depend on the nature of the source data."
"The largest caveat is that the match-finder is a block algorithm, and to
achieve about the same ratio as 7-Zip requires double the dictionary size,
which raises the decompression memory usage."
it is not worth the trouble of breaking lzip's reproducibility
Don't know what you mean by "reproducibility"
Lzip is more than a compressor. It is a set of tools designed around a
format tuned for long-term archiving. It is important that the output of
lzip does not change frequently between versions because such changes may
hinder some kinds of data recovery. See for example
http://www.nongnu.org/lzip/manual/lziprecover_manual.html#Reproducing-one-sector
We need to think about the consequences of the consequences (sic) of any
change to the interface or to the algorithm.
but I didn't mean to replace current encoder/s, rather complement them.
If it was used it could be different compression levels, like 11-19.
Increasing the number of levels also hinders data recovery.
Moreover, options like -11 or -19 are not compatible with POSIX or GNU
standards. See
http://www.nongnu.org/arg-parser/manual/arg_parser_manual.html#Argument-syntax
Also, having a level 11 that compresses less than level 9 is confusing to users.
So these may also be difficult to integrate meaningfully into lzip.
Best regards,
Antonio.