bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Memory requirements for CRC32 calculation


From: Pádraig Brady
Subject: Re: Memory requirements for CRC32 calculation
Date: Thu, 5 Dec 2024 16:19:40 +0000
User-agent: Mozilla Thunderbird Beta

On 05/12/2024 14:31, Sam Russell wrote:
Hi,

I've been looking into other CRC32 implementations for architectures that don't have accelerated 
polynomial multiplication (PCLMUL/VMULL for x86-SSE/AVX or ARM-NEON), and there are improvements we 
can make, but they make tradeoffs for memory (both for lookup tables and for working set). The 
"braiding" approach from zlib seems to be the best publicly available algorithm, and I've 
also been developing my own algorithm "chorba" that outperforms both slicing and braiding 
depending on the circumstances (it ends up needing to be tuned per CPU).

Memory requirements:
Sarwate (Simon's method from RFC1952): 1kB lookup table
Slicing by 8: 8kB lookup table
Braiding: 8kB lookup table
Chorba 1952: 2kB working set
Chorba 57082: 64kB working set
Chorba 703380: 1MB working set
Chorba 19824364: 32MB working set

In terms of performance, braiding seems to be 1.2-3x faster than slicing by 8 
across different systems, whereas chorba depends on the CPU (x86_64 gets better 
with higher working set, is even at 64kB and better with 1MB or 32MB, ARM seems 
to be better cross the board, Raspberry Pi 4 is better at 2kB and gets worse 
after that).

What sort of parameters would be important and what expectations will 
users/maintainers have? For example, coreutils cksum reads a rolling 65kB 
buffer and operates on that, whereas gzip seems to take much larger chunks; are 
there any systems that would appreciate a low-mem mode? Is it a downside that 
different architectures would need to choose different parameters to benefit 
from the speedup?

From the description above, "braiding" seems like a good default.
The main concern I'd have with the others is regressions on some platforms,
not the working set in particular (well <= 64 KiB anyway).

cheers,
Pádraig



reply via email to

[Prev in Thread] Current Thread [Next in Thread]