[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Lzip-bug] Want to Jettison xz(1), But Size Matters.
From: |
Antonio Diaz Diaz |
Subject: |
Re: [Lzip-bug] Want to Jettison xz(1), But Size Matters. |
Date: |
Sat, 21 Jul 2018 00:08:11 +0200 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.9.1.19) Gecko/20110420 SeaMonkey/2.0.14 |
Hi Ralph,
Ralph Corderoy wrote:
It is explained at http://www.nongnu.org/lzip/lzip_benchmark.html#xz2
Thanks, it does indeed explain it. I skipped that section before
because of its heading: `Lzip compresses large tarballs more than xz'.
That read like a claim to me rather than my `Why is xz compressing more
than lzip' FAQ that's explained within. :-)
Thanks for the hint. I have just reworded that heading and the next one
because both explain why xz seems to perform better than lzip in some
circumstances.
It now gives the expected results with my large XCF file. Only 56 MiB
was required for the dictionary.
As you can see, lzip adjusts the dictionary to the file size.
$ stat -c '%s %n' * | sort -k1,1n -k2
20957117 foo.xcf.lzip-m64-s64MiB
21001368 foo.xcf.xz-9
You may perhaps obtain slightly better results with the shorter command
'lzip -9s26'.
2^12 to 2^29 bytes. Note that dictionary sizes are quantized. If
the specified size does not match one of the valid sizes, it will
be rounded upwards by adding up to (BYTES / 8) to it.
Could info's last sentence be extended slightly with a clue why?
It has nothing to do with lzip's algorithm. It is simply for efficiency.
As the dictionary size is just the minimum size of the buffer needed to
decompress a file, it does not hurt to allocate a slightly larger
buffer. This allows the size to be coded in just one byte, instead of
the four bytes used by lzma-alone.
Also, I didn't think the info, or the man page which is always my first
port of call, explicitly stated that the last setting wins, e.g. `-9 -s
64MiB' uses `-9's `-m' of 273, as I think the source currently shows.
Thanks. You are right. I'll make the info manual explicitly state this.
Thanks for your help. My ~/bin/toxz, formerly `tobz2', `togz', `toZ',
for converting already compressed files, has become `tolz'.
My pleasure. :-)
BTW, do you know zutils' zupdate?
http://www.nongnu.org/zutils/zutils.html
file:///home/internet/savannah/zutils/cvs/zutils/manual/zutils_manual.html#Zupdate
"zupdate recompresses files from bzip2, gzip, and xz formats to lzip
format. Each original is compared with the new file and then deleted.
Only regular files with standard file name extensions are recompressed,
other files are ignored. Compressed files are decompressed and then
recompressed on the fly; no temporary files are created."
BTW, I've added a reference to Wikipedia's `LZMA' page, that covers
`LZMA2' too. Hopefully, it will remain.
https://en.wikipedia.org/w/index.php?title=Lempel%E2%80%93Ziv%E2%80%93Markov_chain_algorithm&diff=851144067&oldid=840725747
Thanks. Let's hope it. IIRC, a similar reference was deleted before.
I am surprised that the claim "LZMA2 supports arbitrarily scalable
multithreaded compression and decompression" is still in the Wikipedia
given the findings in http://www.nongnu.org/lzip/xz_inadequate.html and
the fact that xz-utils does not yet implement parallel LZMA2
decompression after a decade. I think all parallel xz (de)compressors
use the same method as plzip (splitting the input file in independent
LZMA/LZMA2 members/blocks/streams). None of them seem to use the claimed
capabilities of LZMA2.
Best regards,
Antonio.