Re: [Lzip-bug] Selection of CRC32 Polynomial for lzip

lzip-bug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lzip-bug] Selection of CRC32 Polynomial for lzip

From:	Antonio Diaz Diaz
Subject:	Re: [Lzip-bug] Selection of CRC32 Polynomial for lzip
Date:	Thu, 18 May 2017 01:45:25 +0200
User-agent:	Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.9.1.19) Gecko/20110420 SeaMonkey/2.0.14

Hello Damir,

Damir wrote:

Have you considered choosing a different polynomial for Crc32 calculation
in lzip file format?


Yes, but I found no compelling reason to change.

Some recent CPUs (x86_64 SSE4.2, PowerPC ISA 2.07, ARM v8.1) offer
hardware accelerated calculation of CRC32 with a different polynomial
(crc32c) than used in lzip (ethernet crc32).

Maybe hardware accelerated calculation of ethernet CRC32 also exists.After all it is the same polynomial used by gzip and zlib.

So, picking crc32c poly instead has two benefits:
1) hardware accelerated integrity checking

Hardware acceleration of CRC calculation makes sense for storage devicesbecause the data is just moved; there is no time spent in processing it.Calculating the CRC is the only calculation involved.

But calculating the CRC is just a small part of the total decompressiontime. So, even if you accelerate it, the total speed gain is small.(Probably smaller than 5%). For compression the speed gain is even smaller.

2) better protection against undetected errors


You will need to prove this one.

CRC32C has a slightly larger Hamming distance than ethernet CRC32 for"small" packet sizes (see pags 3,4 of [1]). But beyond some size perhapsnot much larger than 128 KiB, both have the same HD of 2. For fileslarger than that (uncompressed) size, there is little diference betweenboth CRCs.


[1] http://users.ece.cmu.edu/~koopman/networks/dsn02/dsn02_koopman.pdf

Even more important, we are talking about the interaction betweencompression and integrity checking. The difference between a Hammingdistance of 2 or 3 is probably immaterial here. Maybe you would like toread section 2.10 of [2]. I quote:

"Verification of data integrity in compressed files is different fromother cases (like Ethernet packets) because the data that can becomecorrupted are the compressed data, but the data that are verified (thedataword) are the decompressed data. Decompression can cause errormultiplication; even a single-bit error in the compressed data mayproduce any random number of errors in the decompressed data, or evenmodify the size of the decompressed data."


[2] http://www.nongnu.org/lzip/xz_inadequate.html

The downside is the compatibility problem, but changing version byte in
file header can help with that.

This is a very large downside, most probably to gain almost nothing.IMO, one of the big problems of today's software development is that toomany people are willing to complicate the code without the slightestproof that the proposed change is indeed an improvement.



Best regards,
Antonio.

[Prev in Thread]

Current Thread

[Next in Thread]

[Lzip-bug] Selection of CRC32 Polynomial for lzip, Damir, 2017/05/17
- Re: [Lzip-bug] Selection of CRC32 Polynomial for lzip, Antonio Diaz Diaz <=
  - Re: [Lzip-bug] Selection of CRC32 Polynomial for lzip, Damir, 2017/05/18
    - Re: [Lzip-bug] Selection of CRC32 Polynomial for lzip, Antonio Diaz Diaz, 2017/05/18
    - Re: [Lzip-bug] Selection of CRC32 Polynomial for lzip, Damir, 2017/05/19
    - Re: [Lzip-bug] Selection of CRC32 Polynomial for lzip, Antonio Diaz Diaz, 2017/05/19

Prev by Date: [Lzip-bug] Selection of CRC32 Polynomial for lzip
Next by Date: Re: [Lzip-bug] Selection of CRC32 Polynomial for lzip
Previous by thread: [Lzip-bug] Selection of CRC32 Polynomial for lzip
Next by thread: Re: [Lzip-bug] Selection of CRC32 Polynomial for lzip
Index(es):
- Date
- Thread