lzip-bug
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lzip-bug] Want to Jettison xz(1), But Size Matters.


From: Ralph Corderoy
Subject: Re: [Lzip-bug] Want to Jettison xz(1), But Size Matters.
Date: Fri, 20 Jul 2018 12:58:59 +0100

Hi Antonio,

> It is explained at http://www.nongnu.org/lzip/lzip_benchmark.html#xz2

Thanks, it does indeed explain it.  I skipped that section before
because of its heading: `Lzip compresses large tarballs more than xz'.
That read like a claim to me rather than my `Why is xz compressing more
than lzip' FAQ that's explained within.  :-)

It now gives the expected results with my large XCF file.  Only 56 MiB
was required for the dictionary.

    $ stat -c '%s  %n' * | sort -k1,1n -k2
    20957117  foo.xcf.lzip-m64-s64MiB
    21001368  foo.xcf.xz-9
    23299403  foo.xcf.lzip-9
    23353544  foo.xcf.xz--lzma2=nice=273,dict=32MiB
    55569138  foo.xcf
    $
    $ lzip -vl *.lzip*
       dict   memb  trail  uncompressed  compressed   saved  name
      32 MiB     1      0      55569138    23299403  58.07%  foo.xcf.lzip-9
      56 MiB     1      0      55569138    20957117  62.29%  
foo.xcf.lzip-m64-s64MiB
                              111138276    44256520  60.18%  (totals)
    $

lzip 1.20-1's documentation for `-s' says

    '-s BYTES'
    '--dictionary-size=BYTES'
         When compressing, set the dictionary size limit in bytes. Lzip
         will use the smallest possible dictionary size for each file
         without exceeding this limit. Valid values range from 4 KiB to
         512 MiB. Values 12 to 29 are interpreted as powers of two, meaning
         2^12 to 2^29 bytes. Note that dictionary sizes are quantized. If
         the specified size does not match one of the valid sizes, it will
         be rounded upwards by adding up to (BYTES / 8) to it.

I agree for `-s 4096' through to `-s 5120' that the rounding up adds up
to 511 bytes, that being no more than `BYTES / 8'.

    -s n          len(n)  round(n)  n+=
    [4096, 4097)  1       4096      [0, 1)
    [4097, 4609)  512     4608      [0, 512)
    [4609, 5121)  512     5120      [0, 512)

Could info's last sentence be extended slightly with a clue why?  I
would guess it's because the LZ77 dictionary is followed by what's yet
to be compressed and the chosen length can run into it, as in `ban.ana'
being `copy 3 from 2 back'.

Also, I didn't think the info, or the man page which is always my first
port of call, explicitly stated that the last setting wins, e.g. `-9 -s
64MiB' uses `-9's `-m' of 273, as I think the source currently shows.

Thanks for your help.  My ~/bin/toxz, formerly `tobz2', `togz', `toZ',
for converting already compressed files, has become `tolz'.

BTW, I've added a reference to Wikipedia's `LZMA' page, that covers
`LZMA2' too.  Hopefully, it will remain.
https://en.wikipedia.org/w/index.php?title=Lempel%E2%80%93Ziv%E2%80%93Markov_chain_algorithm&diff=851144067&oldid=840725747

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy



reply via email to

[Prev in Thread] Current Thread [Next in Thread]