Re: [PATCH] crc: New optimised slice-by-8 implementation

> How about adding something like this to modules/crc:

Done

> People using the crc module who want to disable the faster/larger CRC32

> implementation can add 'gl_crc_slice_by_8=yes' to their configure.ac
> before invoking gnulib.

I've been running tests with `./gnulib-tool --with-tests --test crc` so I'm interested in a way of enabling this flag from gnulib-tool

> I still think we should check larger alignment mismatches in the test

> case, it doesn't hurt to test wildly larger mis-alignment for surprising
> behaviour.

I'm not against the idea, I just want to know what error we are detecting with these tests. The largest word we're working with is 64-bits long, and I know some CPUs will fault when these are read from a memory address that isn't 64-bit aligned, but I'm not aware of this happening on other boundaries. If we imagine CPU x that only allows you to read the first 64-bit word of every 65536bits then it would be impossible to natively address an array of uint64_t, for example. I feel it's safe to assume that if we test every alignment from 0-7 (or 0-15 to handle a potential pclmul implementation that reads 128-bit words at a time) then we will catch any alignment issues.

I do believe that unit tests need to be "as simple as possible, but no simpler"; each unit test should be solving one problem well.

> You need to add license terms (GPLv3+?) and follow GNU coding style

Done

> If code is copied then licensing terms has to be clear. RFCs have bad licensing terms, so maybe we cannot use this code at all.

The existing lib/crc.c uses code directly taken from RFC 1952 and is referenced as such, I sent off my copyright paperwork and mentioned this to them and am awaiting a reply.

> That memcpy looks weird, does it really do what you want it to do on all
> architectures? Would using something from endian.h be cleaner?

The purpose is to work around alignment issues. The signature for crc32_update_no_xor() takes a char* and casting up to a uint64_t is undefined by C. Using memcpy lets the compiler and runtime be smart and insulates us from nonaligned memory accesses. I am open to other ideas around this (we can step forward byte by byte until we reach an aligned pointer and then do 8-byte blocks from there, for example, but then we need to detect alignment and do the upcasting ourselves)

You are correct that this breaks under big endian though, thanks for the tip for endian.h. I've added a le64toh() call and this should make it work correctly on big endian systems (how do we go about testing these btw? I don't have one on me).

On Wed, 16 Oct 2024 at 09:05, Simon Josefsson <simon@josefsson.org> wrote:

Sam Russell <sam.h.russell@gmail.com> writes:

> This is my implementation of the slice-by-8 algorithm for CRC32 generation.

Thanks!

> I've added a flag CRC_ENABLE_SLICE_BY_8, I'd appreciate if someone can give
> me a hint on how to set this up in the makefile config. I get the
> impression that we want this to be on by default or for the major
> architectures, but still allow people to disable it easily at build
> time.

How about adding something like this to modules/crc:

if test $gl_crc_slice_by_8 != no; then
AC_DEFINE([GL_CRC_SLICE_BY_8], [1],
[Define to get faster but larger CRC32 operation.])
fi

I suggest another name for the identifier GL_CRC_SLICE_BY_8: 'GL_'
prefix for gnulib, and drop the ENABLE since that is implied by the
value.

People using the crc module who want to disable the faster/larger CRC32
implementation can add 'gl_crc_slice_by_8=yes' to their configure.ac
before invoking gnulib.

Do we have a better pattern for this?

> Tests are in the previous patch, coverage there revolves around potential
> alignment issues and using longer sample data to make sure we're hitting
> the codepaths for length > 8 and length not evenly divisible by 8.

I still think we should check larger alignment mismatches in the test
case, it doesn't hurt to test wildly larger mis-alignment for surprising
behaviour.

> + * lib/crc-generate-table.c: Generation code for CRC32 lookup tables
...
> +++ b/lib/crc-generate-table.c

You need to add license terms (GPLv3+?) and follow GNU coding style:

https://www.gnu.org/prep/standards/standards.html#Writing-C

I don't think this file has to be part of the crc module, it is never
used as far as I can tell. Having the generator code around is
important, for completeness. Compare the gzip generator code:

https://debbugs.gnu.org/cgi/bugreport.cgi?bug=73814

That code doesn't have clear licensing terms, so I don't think it should
be used, but it is an example of similar code.

> +// taken from RFC 1952

Don't use //, see the C style guide.

If code is copied then licensing terms has to be clear. RFCs have bad
licensing terms, so maybe we cannot use this code at all.

It would be good to add comments in crc.c before the tables explaining
how to generate them.

> +uint32_t
> +crc32_update_no_xor_slice_by_8 (uint32_t crc, const char *buf)
> +{
> + uint64_t local_buf;
> + memcpy(&local_buf, buf, 8);

That memcpy looks weird, does it really do what you want it to do on all
architectures? Would using something from endian.h be cleaner?

> +uint32_t
> +crc32_update_no_xor_slice_by_n (uint32_t crc, const char *buf, size_t num_bytes)
> +{
> + uint64_t local_buf;
> + memcpy(&local_buf, buf, num_bytes);

Same here.

> + local_buf = local_buf ^ crc;
> +
> + if (num_bytes >= 4) {
> + crc = 0;
> + }
> + else {
> + crc = crc >> (num_bytes * 8);
> + }

No need for curly braces here.

> + for (size_t i = 0; i < num_bytes; i++) {

I'd move the variable declaration first in this function.

/Simon

From:	Sam Russell
Subject:	Re: [PATCH] crc: New optimised slice-by-8 implementation
Date:	Wed, 16 Oct 2024 12:57:09 +0200