diffutils-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: From wchar_t to char32_t, new module mbszero


From: Paul Eggert
Subject: Re: From wchar_t to char32_t, new module mbszero
Date: Sun, 16 Jul 2023 15:28:45 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0

On 2023-07-16 01:43, Bruno Haible wrote:
Paul Eggert wrote:

However, after implementing mbszero with this data and enabling its use
in many places, I got test failures on NetBSD and Solaris.
   - On NetBSD, the minimum we need to clear is 28 bytes.
   - On Solaris OmniOS and OpenIndiana, the minimum we need to clear is 16 
bytes.
   - On proprietary Solaris, the minimum we need to clear is 20 or 28 bytes
     (depending on 32-bit or 64-bit mode).
So, clearly this is fragile stuff.

Were the test failures for single calls to mbrtoc32, or were they for something else? If the former, what went wrong with the source-code analysis?

Assuming the problem was not with single calls to mbrtoc32, how about if we define another function mbszerotoc32 that is the minimum number of bytes to clear so that a single call to mbrtoc32 will work? Such a macro would be useful for diffutils, and I expect elsewhere.

> +/* _GL_MBSTATE_INIT_SIZE describes how mbsinit() behaves: It is the number of
+   bytes at the beginning of an mbstate_t that need to be zero, for mbsinit()
+   to return true.

This macro is not used anywhere. How about adding a comment explaining why it's defined but not used? Or if it's not needed we can remove it.


+# elif __GLIBC__ >= 2                                     /* glibc */

Should be glibc 2.2 not glibc 2, if I read the history right. Also should check that __GLIBC__ is defined, for the benefit of picky compilers.


# elif defined MUSL_LIBC                                  /* musl libc */
/* mbstate_t is defined in <bits/alltypes.h>.
   It is an opaque aligned 8-byte struct, of which at most the first
   4 bytes are used.
   For more details, see src/multibyte/mbrtowc.c.  */
#  define _GL_MBSTATE_INIT_SIZE 4
#  define _GL_MBSTATE_ZERO_SIZE 4

Better to say 'sizeof (unsigned)' instead of '4', as the source code uses 'unsigned'. This wouldn't affect the machine code on existing platforms, would be better documentation, and would be theoretically better on oddball future platforms. Similarly for other parts of wchar.in.h.


+# elif defined __sun                                      /* Solaris */
+/* On Solaris, mbstate_t is defined in <wchar_impl.h>.
+   It is an opaque aligned 24-byte or 32-byte struct, of which at most the 
first
+   20 or 28 bytes are used.
+   For more details, see the *State types in
+   illumos-gate/usr/src/lib/libc/port/locale/
+   {none,euc,mskanji,big5,gb2312,gbk,gb18030,utf8}.c.  */
+/* File       INIT_SIZE  ZERO_SIZE
+   none.c         0          0
+   euc.c         12         12
+   mskanji.c      4          4
+   big5.c         4          4
+   gb2312.c       4          6
+   gbk.c          4          4
+   gb18030.c      4          8
+   utf8.c        12         12 */
+/* But 12 is not the correct value: we get test failures
+     - in OpenIndiana and OmniOS: for values < 16,
+     - in Solaris 10 and 11: for values < 20 (in 32-bit mode)
+       or < 28 (in 64-bit mode).  */
+#  if defined _LP64
+#   define _GL_MBSTATE_INIT_SIZE 28
+#   define _GL_MBSTATE_ZERO_SIZE 28
+#  else
+#   define _GL_MBSTATE_INIT_SIZE 20
+#   define _GL_MBSTATE_ZERO_SIZE 20
+#  endif

On 64-bit Solaris 10 sparc with either GCC or Oracle's compiler, it's the same number of insns to initialize 28 bytes vs 32 bytes. So if 28 is needed let's drop the optimization for Solaris as not worth the aggravation of maintaining and worrying about its brittleness. 32-bit Solaris is obsolete and will stop working in 2038 and is not worth worrying about even if it's one less insn, so let's drop it for that too.

+#  define _GL_MBSTATE_ZERO_SIZE sizeof (mbstate_t)

Might be better to have this sort of thing at the end, as the default, rather than sprinkle lots of copies of it elsewhere. Likewise for defaulting _GL_MBSTATE_INIT_SIZE to _GL_MBSTATE_ZERO_SIZE, since initializing the latter implies initializing the former.

Proposed patch attached; it implements the above suggestions except for the comment about _GL_MBSTATE_INIT_SIZE. I haven't installed this.

Attachment: 0001-wchar-new-function-mbszerotoc32.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]