coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Multibyte support (round 2)


From: Eric Blake
Subject: Re: Multibyte support (round 2)
Date: Mon, 29 Aug 2016 12:13:12 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0

On 08/27/2016 12:05 AM, Assaf Gordon wrote:

> Regarding wchar_t == UCS:

> And so, the question becomes:
> When the locale is "UTF-8", is the internal representation of 'wchar_t'
> identical to UCS2 or UCS4 (i.e. unicode code-points).
> While the standard explicitly says this can not be assumed,
> I think in practice it is always the case.
> 
> It is so in glibc and musl-libc,
> and in OpenBSD,FreeBSD,NetBSD with "UTF-8" locales (but not in non-utf8 
> locales).

But not in Cygwin, where wchar_t is 2 bytes, and where Cygwin already
supports surrogate pairs in wchar_t to represent Unicode characters
beyond 0xffff (such a representation is a violation of the POSIX
definition of wchar_t, which is supposed to encode every possible
character via a single code point, but it was deemed a better solution
than limiting Cygwin to only the BMP characters, and only affects code
that is explicitly using characters outside BMP).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]