bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: c32width gives incorrect return values in C locale


From: Eli Zaretskii
Subject: Re: c32width gives incorrect return values in C locale
Date: Sun, 12 Nov 2023 08:17:55 +0200

> From: Bruno Haible <bruno@clisp.org>
> Cc: bug-libunistring@gnu.org
> Date: Sat, 11 Nov 2023 23:54:52 +0100
> 
> [CCing bug-libunistring]
> Gavin Smith wrote:
> > I did not understand why uc_width was said to be "locale dependent":
> > 
> >   "These functions are locale dependent."
> > 
> > - from 
> > <https://www.gnu.org/software/libunistring/manual/html_node/uniwidth_002eh.html#index-uc_005fwidth>.
> 
> That's because some Unicode characters have "ambiguous width" — width 1 in
> Western locales, width 2 is East Asian locales (for historic and font choice
> reasons).

I think this should be explained in the documentation, if it isn't
already.  This "ambiguous width" issue is very subtle and unknown to
many (most?) people, so not having it explicit in the documentation is
not user-friendly, IMO.

> > I also don't understand the purpose of the "encoding" argument -- can this
> > always be "UTF-8"?
> 
> Yes, it can be always "UTF-8"; then uc_width will always choose width 1 for
> these characters.

Regardless of the locale?  Is there an assumption that UTF-8 means
"not CJK" or something?

> > I'm also unclear on the exact relationship between the types char32_t,
> > ucs4_t and uint32_t.  For example, uc_width takes a ucs4_t argument
> > but u8_mbtouc writes to a char32_t variable.  In the code I committed,
> > I used a cast to ucs4_t when calling uc_width.
> 
> These types are all identical. Therefore you don't even need to cast.
> 
>   - char32_t comes from <uchar.h> (ISO C 11 or newer).
>   - ucs4_t comes from GNU libunistring.
>   - uint32_t comes from <stdint.h>.

AFAIU, char32_t is identical to uint_least32_t (which is also from
stdint.h).



reply via email to

[Prev in Thread] Current Thread [Next in Thread]