[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: From wchar_t to char32_t, new module mbszero
From: |
Paul Eggert |
Subject: |
Re: From wchar_t to char32_t, new module mbszero |
Date: |
Sun, 16 Jul 2023 15:28:45 -0700 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 |
On 2023-07-16 01:43, Bruno Haible wrote:
Paul Eggert wrote:
However, after implementing mbszero with this data and enabling its use
in many places, I got test failures on NetBSD and Solaris.
- On NetBSD, the minimum we need to clear is 28 bytes.
- On Solaris OmniOS and OpenIndiana, the minimum we need to clear is 16
bytes.
- On proprietary Solaris, the minimum we need to clear is 20 or 28 bytes
(depending on 32-bit or 64-bit mode).
So, clearly this is fragile stuff.
Were the test failures for single calls to mbrtoc32, or were they for
something else? If the former, what went wrong with the source-code
analysis?
Assuming the problem was not with single calls to mbrtoc32, how about if
we define another function mbszerotoc32 that is the minimum number of
bytes to clear so that a single call to mbrtoc32 will work? Such a macro
would be useful for diffutils, and I expect elsewhere.
> +/* _GL_MBSTATE_INIT_SIZE describes how mbsinit() behaves: It is the
number of
+ bytes at the beginning of an mbstate_t that need to be zero, for mbsinit()
+ to return true.
This macro is not used anywhere. How about adding a comment explaining
why it's defined but not used? Or if it's not needed we can remove it.
+# elif __GLIBC__ >= 2 /* glibc */
Should be glibc 2.2 not glibc 2, if I read the history right. Also
should check that __GLIBC__ is defined, for the benefit of picky compilers.
# elif defined MUSL_LIBC /* musl libc */
/* mbstate_t is defined in <bits/alltypes.h>.
It is an opaque aligned 8-byte struct, of which at most the first
4 bytes are used.
For more details, see src/multibyte/mbrtowc.c. */
# define _GL_MBSTATE_INIT_SIZE 4
# define _GL_MBSTATE_ZERO_SIZE 4
Better to say 'sizeof (unsigned)' instead of '4', as the source code
uses 'unsigned'. This wouldn't affect the machine code on existing
platforms, would be better documentation, and would be theoretically
better on oddball future platforms. Similarly for other parts of wchar.in.h.
+# elif defined __sun /* Solaris */
+/* On Solaris, mbstate_t is defined in <wchar_impl.h>.
+ It is an opaque aligned 24-byte or 32-byte struct, of which at most the
first
+ 20 or 28 bytes are used.
+ For more details, see the *State types in
+ illumos-gate/usr/src/lib/libc/port/locale/
+ {none,euc,mskanji,big5,gb2312,gbk,gb18030,utf8}.c. */
+/* File INIT_SIZE ZERO_SIZE
+ none.c 0 0
+ euc.c 12 12
+ mskanji.c 4 4
+ big5.c 4 4
+ gb2312.c 4 6
+ gbk.c 4 4
+ gb18030.c 4 8
+ utf8.c 12 12 */
+/* But 12 is not the correct value: we get test failures
+ - in OpenIndiana and OmniOS: for values < 16,
+ - in Solaris 10 and 11: for values < 20 (in 32-bit mode)
+ or < 28 (in 64-bit mode). */
+# if defined _LP64
+# define _GL_MBSTATE_INIT_SIZE 28
+# define _GL_MBSTATE_ZERO_SIZE 28
+# else
+# define _GL_MBSTATE_INIT_SIZE 20
+# define _GL_MBSTATE_ZERO_SIZE 20
+# endif
On 64-bit Solaris 10 sparc with either GCC or Oracle's compiler, it's
the same number of insns to initialize 28 bytes vs 32 bytes. So if 28 is
needed let's drop the optimization for Solaris as not worth the
aggravation of maintaining and worrying about its brittleness. 32-bit
Solaris is obsolete and will stop working in 2038 and is not worth
worrying about even if it's one less insn, so let's drop it for that too.
+# define _GL_MBSTATE_ZERO_SIZE sizeof (mbstate_t)
Might be better to have this sort of thing at the end, as the default,
rather than sprinkle lots of copies of it elsewhere. Likewise for
defaulting _GL_MBSTATE_INIT_SIZE to _GL_MBSTATE_ZERO_SIZE, since
initializing the latter implies initializing the former.
Proposed patch attached; it implements the above suggestions except for
the comment about _GL_MBSTATE_INIT_SIZE. I haven't installed this.
0001-wchar-new-function-mbszerotoc32.patch
Description: Text Data
- Re: From wchar_t to char32_t, (continued)
- Re: From wchar_t to char32_t, Bruno Haible, 2023/07/06
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/06
- mbcel module for Gnulib?, Paul Eggert, 2023/07/09
- Re: From wchar_t to char32_t, Bruno Haible, 2023/07/10
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/11
- Re: From wchar_t to char32_t, Bruno Haible, 2023/07/11
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/11
- Re: From wchar_t to char32_t, Bruno Haible, 2023/07/13
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/13
- Re: From wchar_t to char32_t, new module mbszero, Bruno Haible, 2023/07/16
- Re: From wchar_t to char32_t, new module mbszero,
Paul Eggert <=
- Re: From wchar_t to char32_t, new module mbszero, Bruno Haible, 2023/07/17
- Re: From wchar_t to char32_t, new module mbszero, Paul Eggert, 2023/07/18
- Re: From wchar_t to char32_t, new module mbszero, Bruno Haible, 2023/07/19
- Re: From wchar_t to char32_t, new module mbszero, Bruno Haible, 2023/07/17