bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: mbcel module for Gnulib?


From: Paul Eggert
Subject: Re: mbcel module for Gnulib?
Date: Wed, 12 Jul 2023 12:40:05 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0

On 2023-07-11 15:14, Bruno Haible wrote:

The performance problems that I see are:

   - glibc's conversion functions are optimized for long sequences
     (think of iconv()). They are not optimized for short invocations
     (one multibyte character or less). This is a long-standing problem,
     that no one is attacking.

Yes, but let's defer that for now. Attacking it would require rethinking the mbrtowc/mbrtoc32 interface entirely, which is a bigger task.


   - glibc's UTF-8 converter is very slow for texts with many non-ASCII
     characters

Although there are ways to attack that, they're a bigger lift and for now I'd like to defer this too.


   - In the C locale (tests a, c, f), conversions of bytes < 0x80 are
     cheap, whereas conversions of bytes >= 0x80 are expensive, because
     in this code path, glibc returns (size_t)-1 and mbrtoc32.c invokes
     hard_locale.

There are opportunities for performance improvement here in the mbcel world. A new scanning primitive could accept an argument that is the value of MB_LEN_MAX or that is true for the C locale, or whatever. Callers could hoist that out of loops and this could improve speed further.

However, diff already hoists for other reasons and does not use mbcel in single-byte locales, so this sort of thing wouldn't help diff. This, plus the extra complexity the additional scanning primitive would add to mbcel.h, is why I didn't implement such a primitive.


     Can we optimize the need for calling hard_locale so often, somehow?
     Or create a variant of mbrtoc32 that fetches the value of hard_locale
     from some cache (maybe a __thread variable)?
     Or can hard_locale itself be optimized (through dirty, glibc specific
     hacks)?

All possible, I suppose, but as described above I wonder whether it would be worth the effort in the mbcel world.


I do *not* see a performance problem with character encodings such as
ISO-8859-7 or GB18030 (tests g, j): the figures are comparable with UTF-8.

Yes, though mbcel should speed up both UTF-8 and the other encodings, compared to mbiter.


How can this be? The Emacs source code is mostly ASCII, and the figures
above (test a, b) show that for this case, mbiter is well optimized.

Although mbiter is faster on ASCII than on non-ASCII, it's not well-optimized compared to mbcel. On Fedora x86-64 here is the kernel of an mbcel-based loop that merely scans ASCII and adds each byte's numerical value to a sum:

 .L28:  addq    %rax, %rbp
        movl    $1, %eax
        addq    %rax, %rbx
        cmpq    %r12, %rbx
        jnb     .L19
        movsbq  (%rbx), %rax
        testb   %al, %al
        jns     .L28

where %rbp = sum, %rbx = pointer to next byte, and %r12 = pointer just past end of input.

In contrast, with mbiter the kernel is:

 .L24:  movq    136(%rsp), %rax
        movq    112(%rsp), %r14
        movq    $1, 144(%rsp)
        movsbl  (%rbx), %edx
        movb    $1, 152(%rsp)
        leaq    1(%rax), %rbx
        movl    %edx, 156(%rsp)
        movq    %rbx, 136(%rsp)
        addq    %rdx, %r15
        movb    $0, 128(%rsp)
        cmpq    %r14, %rbx
        jnb     .L5
 .L14:  movzbl  (%rbx), %ecx
        movl    %ecx, %eax
        shrb    $5, %al
        andl    $7, %eax
        movl    is_basic_table(,%rax,4), %eax
        shrl    %cl, %eax
        testb   $1, %al
        jne     .L24

where %r15 = sum, %rbx = pointer to next byte, %r14 = pointer just past end of input.


I'd better try to copy the worthy optimizations into mbiter, mbuiter.

I don't see how, offhand. Although I'm sure mbiter can be improved I don't see how it could catch up to mbcel so long as it continues to solve a harder problem than mbcel solves.

Candidates for optimization:

- The C locale handling
   https://sourceware.org/bugzilla/show_bug.cgi?id=19932
   https://sourceware.org/bugzilla/show_bug.cgi?id=29511
   It's now a clear POSIX violation. Would it make sense to get this fixed
   in glibc, so that gnulib's override can be dropped on future glibc
   versions?

Absolutely. Does наб's patch in the latter bug report look good to you? If so, I can propose it on libc-alpha after the next release (as it's too late for the next release).


- Resetting an mbstate_t: Should we define a function
      void mbszero (mbstate_t *);
   that clears the relevant part of an mbstate_t (i.e. 24 bytes instead
   of 128 bytes on BSD systems)?
   Advantage: performance.
   Drawback: Yet another gnulib-invented, nonstandard API.

It's likely worth it for mbcel on BSDish hosts. Quite possibly it's also worth it for mbiter and mbuiter. Not sure it's worth it everywhere.

Also I still think it's at most 12 bytes, not 24, though I guess that thread <https://lists.gnu.org/r/bug-gnulib/2023-07/msg00044.html> is still ongoing.


- Is a functional interface faster than one that gets a 'struct' passed
  by reference? I would guess no, since gcc optimizes both cases well,
  especially when inlining. But feel free to prove me wrong.

In my benchmarks mbcel is significantly faster than mbiter on current (commit 6a455aab6ef2c6efd6703cfa9319268b503a8e14) Gnulib benchmarks, even when mbrtoc32-regular is used. You should be able to verify this by running:

./gnulib-tool --create-testdir --dir b mbrtoc32-regular mbiter-bench-tests

and then applying the attached patch, running "make check", and then running these commands in the gltests subdirectory:

  ./bench-mbiter abcdefghij 100000
  ./bench-mbcel  abcdefghij 100000

Here's a summary of the results I got on Fedora 38 x86-64 on an AMD Phenom II X4 910e processor dated 2010.

 user CPU sec   speedup
 mbiter  mbcel  factor  test
  1.735  0.478  3.630   a - ASCII text, C locale
  1.703  0.447  3.810   b - ASCII text, UTF-8 locale
  3.852  1.514  2.544   c - French text, C locale
  3.544  1.600  2.215   d - French text, ISO-8859-1 locale
  3.651  1.662  2.197   e - French text, UTF-8 locale
 26.787 15.115  1.772   f - Greek text, C locale
 21.651 17.106  1.266   g - Greek text, ISO-8859-7 locale
 22.565 17.633  1.280   h - Greek text, UTF-8 locale
 10.011  8.051  1.243   i - Chinese text, UTF-8 locale
  9.787  7.967  1.228   j - Chinese text, GB18030 locale

With a better CPU (a Xeon W-1350 dated 2021) and a slightly-slower OS (Ubuntu 23.04) I got these numbers:

 user CPU sec   speedup
 mbiter  mbcel  factor  test
  0.531  0.238  2.231   a - ASCII text, C locale
  0.478  0.187  2.556   b - ASCII text, UTF-8 locale
  1.262  0.510  2.475   c - French text, C locale
  1.121  0.529  2.119   d - French text, ISO-8859-1 locale
  1.080  0.571  1.891   e - French text, UTF-8 locale
 10.349  5.876  1.761   f - Greek text, C locale
  8.530  6.537  1.305   g - Greek text, ISO-8859-7 locale
  8.407  6.506  1.292   h - Greek text, UTF-8 locale
  3.427  2.578  1.329   i - Chinese text, UTF-8 locale
  3.279  2.489  1.317   j - Chinese text, GB18030 locale

Attachment: y.diff
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]