coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] wc: speed-up by simplifying avx code


From: Pádraig Brady
Subject: Re: [PATCH] wc: speed-up by simplifying avx code
Date: Sun, 31 Mar 2024 21:23:11 +0100
User-agent: Mozilla Thunderbird

On 31/03/2024 18:58, Evgeny Nizhibitsky wrote:
Yes, it's true that simplifying and speeding-up by the bufsize increase are two 
different things although the former allowed the latter.

I just landed more tests with hyperfine for various configurations spanning over the current master 
version and a new approach with a range of bufsizes from 16 KiB up to 1 MiB, running on 1 billion 
yes'es like you did (1by), a generated file for the recent 1 billion row challenge (1brc, with 
entries like "<station name>;<temperature:0.2f>") and the first 100 million 
rows for both of them (100my and 100mrc, respectively), all in /dev/shm, yet again with 7800X3D:

The reported timings are as follows:

| version | 100my | 100mrc | 1by | 1brc |
| ------- | ------- | ------- | ------- | ------- |
| master | 21.3 ms ± 1.0 ms | 163.1 ms ± 1.5 ms | 197.1 ms ± 3.0 ms | 1.680 s ± 
 0.010 s |
| 16 KiB | 21.0 ms ± 1.1 ms | 163.7 ms ± 2.1 ms | 194.3 ms ± 2.5 ms | 1.658 s ± 
0.015 s |
| 32 KiB | 20.2 ms ± 0.7 ms | 158.9 ms ± 3.0 ms | 194.6 ms ± 6.4 ms | 1.620 s ± 
0.023 s |
| 64 KiB | 19.8 ms ± 0.6 ms | 154.0 ms ± 5.3 ms | 187.5 ms ± 7.2 ms | 1.553 s ± 
0.013 s |
| 128 KiB | 18.8 ms ± 0.6 ms | 148.9 ms ± 5.4 ms | 178.4 ms ± 1.3 ms | 1.530 s 
± 0.013 s |
| 256 KiB | 19.2 ms ± 0.8 ms | 145.8 ms ± 1.5 ms | 176.4 ms ± 1.6 ms | 1.522 s 
± 0.017 s |
| 512 KiB | 19.6 ms ± 0.7 ms | 146.4 ms ± 1.0 ms | 183.0 ms ± 5.0 ms | 1.512 s 
± 0.014 s |
| 1 MiB | 19.3 ms ± 0.7 ms | 145.7 ms ± 1.8 ms | 188.4 ms ± 6.2 ms | 1.499 s ± 
0.012 s |

And the corresponding speed-up values are as follows:

| version | 100my | 100mrc | 1by | 1brc |
| ------- | ------- | ------- | ------- | ------- |
| master | 0% | 0% | 0% | 0% |
| 16 KiB | 1% | 0% | 1% | 1% |
| 32 KiB | 5% | 3% | 1% | 4% |
| 64 KiB | 8% | 6% | 5% | 8% |
| 128 KiB | 13% | 10% | 10% | 10% |
| 256 KiB | 11% | 12% | 12% | 10% |
| 512 KiB | 9% | 11% | 8% | 11% |
| 1 MiB | 10% | 12% | 5% | 12% |

So again in my case the new approach is on par with the old one while the sweet 
spot bufsize of 256 KB seems to bring the best value.

Still more testing on different CPUs and sample files should probably be 
conducted.

Excellent.
This concurs with my testing with this patch on my laptop,
and my testing of 256KiB buffer sizes with:
https://github.com/coreutils/coreutils/commit/fcfba90d0

I'll test on a few other systems and adjust the configure check before 
committing.

thanks!
Pádraig



reply via email to

[Prev in Thread] Current Thread [Next in Thread]