[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Lzip-bug] Want to Jettison xz(1), But Size Matters.
From: |
Ralph Corderoy |
Subject: |
Re: [Lzip-bug] Want to Jettison xz(1), But Size Matters. |
Date: |
Fri, 20 Jul 2018 12:58:59 +0100 |
Hi Antonio,
> It is explained at http://www.nongnu.org/lzip/lzip_benchmark.html#xz2
Thanks, it does indeed explain it. I skipped that section before
because of its heading: `Lzip compresses large tarballs more than xz'.
That read like a claim to me rather than my `Why is xz compressing more
than lzip' FAQ that's explained within. :-)
It now gives the expected results with my large XCF file. Only 56 MiB
was required for the dictionary.
$ stat -c '%s %n' * | sort -k1,1n -k2
20957117 foo.xcf.lzip-m64-s64MiB
21001368 foo.xcf.xz-9
23299403 foo.xcf.lzip-9
23353544 foo.xcf.xz--lzma2=nice=273,dict=32MiB
55569138 foo.xcf
$
$ lzip -vl *.lzip*
dict memb trail uncompressed compressed saved name
32 MiB 1 0 55569138 23299403 58.07% foo.xcf.lzip-9
56 MiB 1 0 55569138 20957117 62.29%
foo.xcf.lzip-m64-s64MiB
111138276 44256520 60.18% (totals)
$
lzip 1.20-1's documentation for `-s' says
'-s BYTES'
'--dictionary-size=BYTES'
When compressing, set the dictionary size limit in bytes. Lzip
will use the smallest possible dictionary size for each file
without exceeding this limit. Valid values range from 4 KiB to
512 MiB. Values 12 to 29 are interpreted as powers of two, meaning
2^12 to 2^29 bytes. Note that dictionary sizes are quantized. If
the specified size does not match one of the valid sizes, it will
be rounded upwards by adding up to (BYTES / 8) to it.
I agree for `-s 4096' through to `-s 5120' that the rounding up adds up
to 511 bytes, that being no more than `BYTES / 8'.
-s n len(n) round(n) n+=
[4096, 4097) 1 4096 [0, 1)
[4097, 4609) 512 4608 [0, 512)
[4609, 5121) 512 5120 [0, 512)
Could info's last sentence be extended slightly with a clue why? I
would guess it's because the LZ77 dictionary is followed by what's yet
to be compressed and the chosen length can run into it, as in `ban.ana'
being `copy 3 from 2 back'.
Also, I didn't think the info, or the man page which is always my first
port of call, explicitly stated that the last setting wins, e.g. `-9 -s
64MiB' uses `-9's `-m' of 273, as I think the source currently shows.
Thanks for your help. My ~/bin/toxz, formerly `tobz2', `togz', `toZ',
for converting already compressed files, has become `tolz'.
BTW, I've added a reference to Wikipedia's `LZMA' page, that covers
`LZMA2' too. Hopefully, it will remain.
https://en.wikipedia.org/w/index.php?title=Lempel%E2%80%93Ziv%E2%80%93Markov_chain_algorithm&diff=851144067&oldid=840725747
--
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy