[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: mbcel module for Gnulib?
|
From: |
Paul Eggert |
|
Subject: |
Re: mbcel module for Gnulib? |
|
Date: |
Wed, 12 Jul 2023 12:40:05 -0700 |
|
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 |
On 2023-07-11 15:14, Bruno Haible wrote:
The performance problems that I see are:
- glibc's conversion functions are optimized for long sequences
(think of iconv()). They are not optimized for short invocations
(one multibyte character or less). This is a long-standing problem,
that no one is attacking.
Yes, but let's defer that for now. Attacking it would require rethinking
the mbrtowc/mbrtoc32 interface entirely, which is a bigger task.
- glibc's UTF-8 converter is very slow for texts with many non-ASCII
characters
Although there are ways to attack that, they're a bigger lift and for
now I'd like to defer this too.
- In the C locale (tests a, c, f), conversions of bytes < 0x80 are
cheap, whereas conversions of bytes >= 0x80 are expensive, because
in this code path, glibc returns (size_t)-1 and mbrtoc32.c invokes
hard_locale.
There are opportunities for performance improvement here in the mbcel
world. A new scanning primitive could accept an argument that is the
value of MB_LEN_MAX or that is true for the C locale, or whatever.
Callers could hoist that out of loops and this could improve speed further.
However, diff already hoists for other reasons and does not use mbcel in
single-byte locales, so this sort of thing wouldn't help diff. This,
plus the extra complexity the additional scanning primitive would add to
mbcel.h, is why I didn't implement such a primitive.
Can we optimize the need for calling hard_locale so often, somehow?
Or create a variant of mbrtoc32 that fetches the value of hard_locale
from some cache (maybe a __thread variable)?
Or can hard_locale itself be optimized (through dirty, glibc specific
hacks)?
All possible, I suppose, but as described above I wonder whether it
would be worth the effort in the mbcel world.
I do *not* see a performance problem with character encodings such as
ISO-8859-7 or GB18030 (tests g, j): the figures are comparable with UTF-8.
Yes, though mbcel should speed up both UTF-8 and the other encodings,
compared to mbiter.
How can this be? The Emacs source code is mostly ASCII, and the figures
above (test a, b) show that for this case, mbiter is well optimized.
Although mbiter is faster on ASCII than on non-ASCII, it's not
well-optimized compared to mbcel. On Fedora x86-64 here is the kernel of
an mbcel-based loop that merely scans ASCII and adds each byte's
numerical value to a sum:
.L28: addq %rax, %rbp
movl $1, %eax
addq %rax, %rbx
cmpq %r12, %rbx
jnb .L19
movsbq (%rbx), %rax
testb %al, %al
jns .L28
where %rbp = sum, %rbx = pointer to next byte, and %r12 = pointer just
past end of input.
In contrast, with mbiter the kernel is:
.L24: movq 136(%rsp), %rax
movq 112(%rsp), %r14
movq $1, 144(%rsp)
movsbl (%rbx), %edx
movb $1, 152(%rsp)
leaq 1(%rax), %rbx
movl %edx, 156(%rsp)
movq %rbx, 136(%rsp)
addq %rdx, %r15
movb $0, 128(%rsp)
cmpq %r14, %rbx
jnb .L5
.L14: movzbl (%rbx), %ecx
movl %ecx, %eax
shrb $5, %al
andl $7, %eax
movl is_basic_table(,%rax,4), %eax
shrl %cl, %eax
testb $1, %al
jne .L24
where %r15 = sum, %rbx = pointer to next byte, %r14 = pointer just past
end of input.
I'd better try to copy the worthy optimizations into mbiter, mbuiter.
I don't see how, offhand. Although I'm sure mbiter can be improved I
don't see how it could catch up to mbcel so long as it continues to
solve a harder problem than mbcel solves.
Candidates for optimization:
- The C locale handling
https://sourceware.org/bugzilla/show_bug.cgi?id=19932
https://sourceware.org/bugzilla/show_bug.cgi?id=29511
It's now a clear POSIX violation. Would it make sense to get this fixed
in glibc, so that gnulib's override can be dropped on future glibc
versions?
Absolutely. Does наб's patch in the latter bug report look good to you?
If so, I can propose it on libc-alpha after the next release (as it's
too late for the next release).
- Resetting an mbstate_t: Should we define a function
void mbszero (mbstate_t *);
that clears the relevant part of an mbstate_t (i.e. 24 bytes instead
of 128 bytes on BSD systems)?
Advantage: performance.
Drawback: Yet another gnulib-invented, nonstandard API.
It's likely worth it for mbcel on BSDish hosts. Quite possibly it's also
worth it for mbiter and mbuiter. Not sure it's worth it everywhere.
Also I still think it's at most 12 bytes, not 24, though I guess that
thread <https://lists.gnu.org/r/bug-gnulib/2023-07/msg00044.html> is
still ongoing.
- Is a functional interface faster than one that gets a 'struct' passed
by reference? I would guess no, since gcc optimizes both cases well,
especially when inlining. But feel free to prove me wrong.
In my benchmarks mbcel is significantly faster than mbiter on current
(commit 6a455aab6ef2c6efd6703cfa9319268b503a8e14) Gnulib benchmarks,
even when mbrtoc32-regular is used. You should be able to verify this by
running:
./gnulib-tool --create-testdir --dir b mbrtoc32-regular
mbiter-bench-tests
and then applying the attached patch, running "make check", and then
running these commands in the gltests subdirectory:
./bench-mbiter abcdefghij 100000
./bench-mbcel abcdefghij 100000
Here's a summary of the results I got on Fedora 38 x86-64 on an AMD
Phenom II X4 910e processor dated 2010.
user CPU sec speedup
mbiter mbcel factor test
1.735 0.478 3.630 a - ASCII text, C locale
1.703 0.447 3.810 b - ASCII text, UTF-8 locale
3.852 1.514 2.544 c - French text, C locale
3.544 1.600 2.215 d - French text, ISO-8859-1 locale
3.651 1.662 2.197 e - French text, UTF-8 locale
26.787 15.115 1.772 f - Greek text, C locale
21.651 17.106 1.266 g - Greek text, ISO-8859-7 locale
22.565 17.633 1.280 h - Greek text, UTF-8 locale
10.011 8.051 1.243 i - Chinese text, UTF-8 locale
9.787 7.967 1.228 j - Chinese text, GB18030 locale
With a better CPU (a Xeon W-1350 dated 2021) and a slightly-slower OS
(Ubuntu 23.04) I got these numbers:
user CPU sec speedup
mbiter mbcel factor test
0.531 0.238 2.231 a - ASCII text, C locale
0.478 0.187 2.556 b - ASCII text, UTF-8 locale
1.262 0.510 2.475 c - French text, C locale
1.121 0.529 2.119 d - French text, ISO-8859-1 locale
1.080 0.571 1.891 e - French text, UTF-8 locale
10.349 5.876 1.761 f - Greek text, C locale
8.530 6.537 1.305 g - Greek text, ISO-8859-7 locale
8.407 6.506 1.292 h - Greek text, UTF-8 locale
3.427 2.578 1.329 i - Chinese text, UTF-8 locale
3.279 2.489 1.317 j - Chinese text, GB18030 locale
y.diff
Description: Text Data
- Re: From wchar_t to char32_t, (continued)
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/03
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/03
- Re: From wchar_t to char32_t, Bruno Haible, 2023/07/03
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/03
- Re: From wchar_t to char32_t, Bruno Haible, 2023/07/04
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/04
- Re: From wchar_t to char32_t, Bruno Haible, 2023/07/06
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/06
- mbcel module for Gnulib?, Paul Eggert, 2023/07/09
- Re: mbcel module for Gnulib?, Bruno Haible, 2023/07/11
- Re: mbcel module for Gnulib?,
Paul Eggert <=
- Re: mbcel module for Gnulib?, Bruno Haible, 2023/07/13
- Re: mbcel module for Gnulib?, Bruno Haible, 2023/07/16
- Re: mbcel module for Gnulib?, Bruno Haible, 2023/07/20
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Bruno Haible, 2023/07/16
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Paul Eggert, 2023/07/17
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Bruno Haible, 2023/07/20
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Paul Eggert, 2023/07/21
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Bruno Haible, 2023/07/21
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Paul Eggert, 2023/07/21
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Bruno Haible, 2023/07/24