|
| From: | Bruno Haible |
| Subject: | new module 'mbuiterf' |
| Date: | Tue, 18 Jul 2023 13:55:38 +0200 |
This set of patches adds a new module 'mbuiterf', similar to 'mbuiter',
just faster. The 'f' stands for "faster" or "functional style".
Kudos to Paul Eggert for the intuition that a function that returns values
is more efficient than an equivalent function that modifies state and
returns void.
The benchmarks clearly show the speedup, especially for the case of ASCII
text (CPU time, measured on x86_64, with gcc 13, on an AMD Ryzen 7 CPU):
gcc clang
mbuiter mbuiterf mbuiter mbuiterf
a 0.990 0.205 1.225 1.038
b 1.032 0.207 1.232 1.054
c 2.323 1.212 2.607 2.345
d 2.036 0.905 2.358 2.076
e 2.120 0.953 2.358 2.083
f 15.335 15.036 15.307 15.496
g 10.402 9.726 10.636 10.382
h 11.082 10.223 11.324 10.899
i 4.846 4.713 4.882 4.922
j 5.151 4.919 5.097 5.137
The speedup of the ASCII case with gcc is understandable when one looks
at the generated code: The inner loop in bench-mbuiter's do_test is
.L8:
addq %rax, %rbx
movq 120(%rsp), %rax
addq %rax, 112(%rsp)
movb $0, 104(%rsp)
.L6:
movq 112(%rsp), %r14
cmpb $0, (%r14)
js .L9
movq $1, 120(%rsp)
movsbl (%r14), %eax
movb $1, 128(%rsp)
movl %eax, 132(%rsp)
movb $1, 104(%rsp)
.L7:
testl %eax, %eax
jne .L8
whereas the inner loop in bench-mbuiterf's do_test is
.L15:
testb %al, %al
js .L6
addq %rax, %rbx
movl $1, %eax
addq %rax, %r14
movsbq (%r14), %rax
testb %al, %al
jne .L15
That's nearly optimal.
The module 'mbuiter' is still recommended for more complicated code
that is not performance critical, because it has a simpler idiom.
For this reason, I'm not optimizing the modules 'mbsstr', 'mbscasestr',
'regex-quote', 'propername', 'exclude': Their code is more readable
with the 'mbuiter' macros.
2023-07-18 Bruno Haible <bruno@clisp.org>
mbsspn: Optimize.
* lib/mbsspn.c: Include mbuiterf.h instead of mbuiter.h.
(mbsspn): Use mbuif_* macros instead of mbui_* macros.
* modules/mbsspn (Depends-on): Add mbuiterf. Remove mbuiter.
mbscspn: Optimize.
* lib/mbscspn.c: Include mbuiterf.h instead of mbuiter.h.
(mbscspn): Use mbuif_* macros instead of mbui_* macros.
* modules/mbscspn (Depends-on): Add mbuiterf. Remove mbuiter.
mbspbrk: Optimize.
* lib/mbspbrk.c: Include mbuiterf.h instead of mbuiter.h.
(mbspbrk): Use mbuif_* macros instead of mbui_* macros.
* modules/mbspbrk (Depends-on): Add mbuiterf. Remove mbuiter.
mbspcasecmp: Optimize.
* lib/mbspcasecmp.c: Include mbuiterf.h instead of mbuiter.h.
(mbspcasecmp): Use mbuif_* macros instead of mbui_* macros.
* modules/mbspcasecmp (Depends-on): Add mbuiterf. Remove mbuiter.
mbsncasecmp: Optimize.
* lib/mbsncasecmp.c: Include mbuiterf.h instead of mbuiter.h.
(mbsncasecmp): Use mbuif_* macros instead of mbui_* macros.
* modules/mbsncasecmp (Depends-on): Add mbuiterf. Remove mbuiter.
mbscasecmp: Optimize.
* lib/mbscasecmp.c: Include mbuiterf.h instead of mbuiter.h.
(mbscasecmp): Use mbuif_* macros instead of mbui_* macros.
* modules/mbscasecmp (Depends-on): Add mbuiterf. Remove mbuiter.
mbssep: Optimize.
* lib/mbssep.c: Include mbuiterf.h instead of mbuiter.h.
(mbssep): Use mbuif_* macros instead of mbui_* macros.
* modules/mbssep (Depends-on): Add mbuiterf. Remove mbuiter.
mbsrchr: Optimize.
* lib/mbsrchr.c: Include mbuiterf.h instead of mbuiter.h.
(mbsrchr): Use mbuif_* macros instead of mbui_* macros.
* modules/mbsrchr (Depends-on): Add mbuiterf. Remove mbuiter.
mbschr: Optimize.
* lib/mbschr.c: Include mbuiterf.h instead of mbuiter.h.
(mbschr): Use mbuif_* macros instead of mbui_* macros.
* modules/mbschr (Depends-on): Add mbuiterf. Remove mbuiter.
mbslen: Optimize.
* lib/mbslen.c: Include mbuiterf.h instead of mbuiter.h.
(mbslen): Use mbuif_* macros instead of mbui_* macros.
* modules/mbslen (Depends-on): Add mbuiterf. Remove mbuiter.
mbuiterf: Add a benchmark.
* tests/bench-mbuiterf.c: New file, based on tests/bench-mbuiter.c.
* modules/mbuiterf-bench-tests: New file, based on
modules/mbuiter-bench-tests.
mbuiterf: New module.
* lib/mbuiterf.h: New file, based on lib/mbuiter.h.
* lib/mbuiterf.c: New file, based on lib/mbuiter.c.
* modules/mbuiterf: New file, based on modules/mbuiter.
0001-mbuiterf-New-module.patch
Description: Text Data
0002-mbuiterf-Add-a-benchmark.patch
Description: Text Data
0003-mbslen-Optimize.patch
Description: Text Data
0004-mbschr-Optimize.patch
Description: Text Data
0005-mbsrchr-Optimize.patch
Description: Text Data
0006-mbssep-Optimize.patch
Description: Text Data
0007-mbscasecmp-Optimize.patch
Description: Text Data
0008-mbsncasecmp-Optimize.patch
Description: Text Data
0009-mbspcasecmp-Optimize.patch
Description: Text Data
0010-mbspbrk-Optimize.patch
Description: Text Data
0011-mbscspn-Optimize.patch
Description: Text Data
0012-mbsspn-Optimize.patch
Description: Text Data
| [Prev in Thread] | Current Thread | [Next in Thread] |