[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Lzip-bug] Re: performance: gzip, lzip, xz
From: |
Antonio Diaz Diaz |
Subject: |
Re: [Lzip-bug] Re: performance: gzip, lzip, xz |
Date: |
Tue, 13 Oct 2009 14:32:29 +0200 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.7.11) Gecko/20050905 |
Hello Jim,
Thanks for your interest in lzip. I hope I'll be able to convince you
that lzip is better than you think. :-)
Jim Meyering wrote:
Claiming that xz has no clear goal seems mildly libelous.
I am not trying to discredit anybody. I am only stating that the xz
format is far from ready for general use.
Maybe xz has a clear goal, but I have been unable to discover what it
could be. Perhaps its goal is to find out the limit between format
flexibility and format security, given the number of times the xz format
had to be changed due to security problems.
Clearly long term stability is not the goal of xz. Just read the README
file for 4.999.9beta, line 51:
"Since the .xz format allows adding new filter IDs, it is possible that
some day there will be a filter that is, for example, much faster to
compress than LZMA2 (but probably with worse compression ratio).
Similarly, it is possible that some day there is a filter that will
compress better than LZMA2".
Will the old filters be removed as new ones are added, leaving users
without support for old files, or will xz become increasingly bloated by
old filters that almost nobody uses?
In any case, one does not need to be an IBM engineer to notice xz goal
is not as clear as lzip goal:
http://lpar.ath0.com/2009/09/25/documentation-as-an-indicator-of-code-quality/
"Comparing the two, I see that xz has many more options. It has all
kinds of tweaks to specify how much memory it uses, tweak various
internal details of the LZMA algorithm, and filter the data. None of
these options are adequately explained. To quote Ted Nelson quoting
Roger Gregory, "An option means the programmer didn't have a clear idea
of what the module was supposed to do." Or as Steve Krug puts it, "Don't
make me think."
In contrast, lzip's user interface is much simpler, and closer to the
Unix philosophy of "do one thing, and do it well". The only two tweaks
to the LZMA algorithm lzip provides are adequately explained if you know
the basics of how compression algorithms tend to work, and there's a
table showing how they correspond to the compression levels -0 to -9.
The only borderline gratuitous option is to split the compressed file
into chunks, and that's at least a useful one. It also gets the SI units
right.
So, lzip wins by a landslide on UI and documentation".
The .xz format is in no way an archive-like format. You cannot store
file names in .xz, and .xz supports even less metadata than .gz.
By archiver-like I mean it is way too complicated for a general purpose
compressor and it includes features I have only found in archiver
formats, like the subblock filter.
Regarding the possibility of recovery, there are not many differences
between .xz and .lz.
There is an important difference; in case of data corruption, xz format
can fail in a thousand more ways than the much simpler lzip format. This
is the reason lzip does have a recovery tool already, and XZ Utils does
not. Just compare the formats to see what I mean.
http://www.nongnu.org/lzip/manual/lzip_manual.html#File-Format
http://tukaani.org/xz/xz-file-format-1.0.4.txt
One inconsistency that can make difficult even the detection of data
corruption in xz files is that the format only requires implementations
to support CRC32[1], but the xz tool uses CRC64 by default[2].
[1] see xz-file-format-1.0.4.txt, line 353.
[2] see "man xz", line 362.
Claiming long-term stability of the .lz format is a stretch.
Lzip format is definitive. It offers the same capabilities as bzip2. If
some day I discover some better compression algorithm and decide to
implement it, I'll write a new compressor and format. Remember, "do one
thing, and do it well".
The file format has changed at least once (probably twice, but I'm
not sure) since the first stable release. Older versions of lzip
cannot decompress new format files. The same can and (I'm sure) will
happen with .xz too, but in case of .lz, it has been about adding basic
features that .xz had in the first place.
Lzip format has changed exactly once form the first released version.
The only two changes were:
The "member size" field was added to improve the recovery of undamaged
members from multimember files.
Coding of dictionary size in member header was extended to support more
fine grained values.
I do not see those changes as "basic features", and certainly data
recovery is not present in xz even now.
Regards,
Antonio.
- [Lzip-bug] Re: performance: gzip, lzip, xz, Jim Meyering, 2009/10/12
- Re: [Lzip-bug] Re: performance: gzip, lzip, xz,
Antonio Diaz Diaz <=
- Re: [Lzip-bug] Re: performance: gzip, lzip, xz, Jim Meyering, 2009/10/13
- Re: [Lzip-bug] Re: performance: gzip, lzip, xz, Antonio Diaz Diaz, 2009/10/14
- Re: [Lzip-bug] Re: performance: gzip, lzip, xz, Jim Meyering, 2009/10/14
- Re: [Lzip-bug] Re: performance: gzip, lzip, xz, Antonio Diaz Diaz, 2009/10/14