|
From: | Antonio Diaz Diaz |
Subject: | Re: [Lzip-bug] lzip vs. zstd |
Date: | Thu, 20 Oct 2016 02:47:21 +0200 |
User-agent: | Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.9.1.19) Gecko/20110420 SeaMonkey/2.0.14 |
address@hidden wrote:
do you already have numbers, opinions and maybe a comparison in reliability, speed, compression ratio etc. against the new zstd?
I have used unzcrash to test the ability of the zstd decoder to detect corruption by itself (without a checksum), and the results are not good. As an example, here are the results of repeatedly decompressing the file COPYING.zst (a copy of the GPLv3) inverting a bit each time as to test all possible bit flips:
11913 bytes tested 95304 total decompressions 56058 decompressions returned with zero status, of which 56017 comparisons failedThe zstd decoder detects the corruption less than half of the times. Compare this with the lzip decoder, that detects about 99.99995% of the bit flips even without the help of its 3-factor integrity checking.
Using 'zstd --no-check' is significantly unsafer than using 'xz --check=none'.
Even with integrity checking enabled, my guess is that it is at least a million times more probable to get a false negative (undetected corruption) from zstd than from lzip.
The zstd file format has many of the defects of the xz format[1]; unprotected lengths, unprotected flags, unprotected dictionary IDs, optional integrity checking, optional file concatenation, and it does not seem to admit trailing data. Also the current version of the zstd file format is 0.2.0, which may mean that changes in the format are expected.
Zstd is described as a "fast real-time compression algorithm". AFAIK, its author does not recommend zstd for long-term archiving.
So my advice is that you should not use zstd for long-term archiving. [1] http://www.nongnu.org/lzip/xz_inadequate.htmlJuan Francisco Cantero Hurtado asked me if I know why the tests of zstd take so long to finish.
It seems that 'make test' takes a lot of time (17 min) because it is a full regression test, not just a small test with a few files to verify that compilation went well, as most programs do. The theoretical basis of zstd[2] seems more complicated than that of LZMA, and the author probably wants to make sure that any possible bug is caught early.
[2] https://arxiv.org/abs/1311.2540 Asymmetric numeral systems: entropy coding combining speed of Huffman coding with compression rate of arithmetic coding.
Best regards, Antonio.
[Prev in Thread] | Current Thread | [Next in Thread] |