[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: mbcel module for Gnulib?
|
From: |
Bruno Haible |
|
Subject: |
Re: mbcel module for Gnulib? |
|
Date: |
Mon, 17 Jul 2023 00:03:21 +0200 |
Hi Paul,
I replied:
> > Here's a summary of the results I got on Fedora 38 x86-64 on an AMD
> > Phenom II X4 910e processor dated 2010.
> >
> > user CPU sec speedup
> > mbiter mbcel factor test
> > 1.735 0.478 3.630 a - ASCII text, C locale
> > 1.703 0.447 3.810 b - ASCII text, UTF-8 locale
> > 3.852 1.514 2.544 c - French text, C locale
> > 3.544 1.600 2.215 d - French text, ISO-8859-1 locale
> > 3.651 1.662 2.197 e - French text, UTF-8 locale
> > 26.787 15.115 1.772 f - Greek text, C locale
> > 21.651 17.106 1.266 g - Greek text, ISO-8859-7 locale
> > 22.565 17.633 1.280 h - Greek text, UTF-8 locale
> > 10.011 8.051 1.243 i - Chinese text, UTF-8 locale
> > 9.787 7.967 1.228 j - Chinese text, GB18030 locale
>
> Impressive! I'll repeat these benchmarks, after having optimized mbiter
> a bit more.
Even after optimizing away the is_basic_table, I'm measuring essentially the
same timing differences as you did (on an AMD Ryzen 7):
mbiter mbcel factor
a 0.849 0.221 3.84
b 0.994 0.221 4.50
c 1.959 0.674
d 1.598 0.726
e 1.876 0.749
f 14.284 5.813
g 8.580 6.715
h 9.080 6.686
i 4.167 2.849
j 4.356 3.062
Factor 4 for the ASCII text cases (a, b).
The inner loops still look the same (this is with gcc 13.1, -O2):
Inner loop with mbcel:
.L18:
addq %rax, %r14
movl $1, %eax
.L7:
addq %rax, %rbx
cmpq %r15, %rbx
jnb .L5
.L8:
movsbq (%rbx), %rax
testb %al, %al
jns .L18
Inner loop with mbiter:
.L24:
movq 136(%rsp), %rax
movq 112(%rsp), %r15
movq $1, 144(%rsp)
movsbl (%rbx), %ecx
movb $1, 152(%rsp)
leaq 1(%rax), %rbx
movl %ecx, 156(%rsp)
.L7:
movq %rbx, 136(%rsp)
addq %rcx, %r14
movb $0, 128(%rsp)
cmpq %r15, %rbx
jnb .L5
.L14:
cmpb $0, (%rbx)
jns .L24
With an old gcc 4.2.4 I get these timings:
mbiter mbcel factor
a 0.988 0.700 1.41
b 1.031 0.716 1.44
c 2.754 1.435
d 2.004 1.432
e 2.017 1.487
f 20.237 8.242
g 13.971 9.701
h 13.117 9.626
i 5.553 4.068
j 6.542 4.080
and this code:
Inner loop with mbcel:
.L23:
movsbl %al,%eax
movl $1, %ecx
movl %eax, -248(%ebp)
movl $0, -244(%ebp)
.L13:
movl -248(%ebp), %eax
addl %eax, -232(%ebp)
movl -244(%ebp), %edx
adcl %edx, -228(%ebp)
addl %ecx, %ebx
cmpl %edi, %ebx
jae .L9
.L10:
movzbl (%ebx), %eax
testb %al, %al
jns .L23
Inner loop with mbiter:
.L32:
movl -100(%ebp), %ecx
movl $1, -96(%ebp)
movl -116(%ebp), %esi
movsbl (%ecx),%eax
movb $1, -92(%ebp)
movl %eax, -88(%ebp)
.L13:
xorl %edx, %edx
movl %ecx, %ebx
addl %eax, -288(%ebp)
adcl %edx, -284(%ebp)
addl -96(%ebp), %ebx
movb $0, -104(%ebp)
cmpl %esi, %ebx
movl %ebx, -100(%ebp)
jae .L9
.L10:
cmpb $0, (%ebx)
jns .L32
So, here the performance difference was not so dramatic.
So, clearly, the struct-as-return-value approach nowadays generates
more efficient code than when mbiter was designed (in 2001-2005).
I will thus create new variants of mbiter, mbuiter, called mbitern, mbuitern
('n' for "new"), that allow better optimization by current GCC.
The main API differences between mbiter and mbcel are:
- mbiter produces a result, by reference, that includes a 'struct mbchar',
mbcel's result is by value.
- In mbiter, the loop control is in the struct.
In mbcel, it is in local variables.
- In mbiter, mbi_avail can be called multiple times, and the multibyte
character is only consumed by mbi_advance (atomically).
In mbcel, the multibyte character is only available after mbcel_scan
was invoked in the loop's body.
This difference creates the possibility (programmer mistake) that the
result of mbcel_scan gets used while it is not yet ready. But this
is not a big problem, since gcc would warn about "uninitialized" variables
when the programmer makes this mistake. (Another thing we could not
rely on in 2001-2005.)
Bruno
- Re: From wchar_t to char32_t, (continued)
- Re: From wchar_t to char32_t, Bruno Haible, 2023/07/03
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/03
- Re: From wchar_t to char32_t, Bruno Haible, 2023/07/04
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/04
- Re: From wchar_t to char32_t, Bruno Haible, 2023/07/06
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/06
- mbcel module for Gnulib?, Paul Eggert, 2023/07/09
- Re: mbcel module for Gnulib?, Bruno Haible, 2023/07/11
- Re: mbcel module for Gnulib?, Paul Eggert, 2023/07/12
- Re: mbcel module for Gnulib?, Bruno Haible, 2023/07/13
- Re: mbcel module for Gnulib?,
Bruno Haible <=
- Re: mbcel module for Gnulib?, Bruno Haible, 2023/07/20
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Bruno Haible, 2023/07/16
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Paul Eggert, 2023/07/17
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Bruno Haible, 2023/07/20
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Paul Eggert, 2023/07/21
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Bruno Haible, 2023/07/21
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Paul Eggert, 2023/07/21
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Bruno Haible, 2023/07/24
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Paul Eggert, 2023/07/25
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Paul Eggert, 2023/07/22