[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly.
From: |
John Kearney |
Subject: |
Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly. |
Date: |
Thu, 23 Feb 2012 03:43:49 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux i686; rv:10.0) Gecko/20120129 Thunderbird/10.0 |
^ caviot you can represent the full 0x10ffff in UTF-16, you just need 2
UTF-16 characters. check out the latest version of unicode.c for an
example how.
On 02/22/2012 11:32 PM, Eric Blake wrote:
> On 02/22/2012 03:01 PM, Linda Walsh wrote:
>> My question had to do with an unqualified wint_t not
>> unsigned wint_t and what platform existed where an 'int' type or
>> wide-int_t, was, without qualifiers, unsigned. I still would like
>> to know -- and posix allows int/wide-ints to be unsigned without
>> the unsigned keyword?
>
> 'int' is signed, and at least 16 bits (these days, it's usually 32). It
> can also be written 'signed int'.
>
> 'unsigned int' is unsigned, and at least 16 bits (these days, it's
> usually 32).
>
> 'wchar_t' is an arbitrary integral type, either signed or unsigned, and
> capable of holding the value of all valid wide characters. It is
> possible to define a system where wchar_t and char are identical
> (limiting yourself to 256 valid characters), but that is not done in
> practice. More common are platforms that use 65536 characters (only the
> basic plane of Unicode) for 16 bits, or full Unicode (0 to 0x10ffff) for
> 32 bits. Platforms that use 65536 characters and 16-bit wchar_t must
> have wchar_t be unsigned; whereas platforms that have wchar_t wider than
> the largest valid character can choose signed or unsigned with no impact.
>
> 'wint_t' is an arbitrary integral type, either signed or unsigned, at
> least as wide as wchar_t, and capable of holding the value of all valid
> wide characters and the sentinel WEOF. Like wchar_t, it may hold values
> that are neither WEOF or valid characters; and in fact, it is more
> likely to do so, since either wchar_t is saturated (all bit values are
> valid characters) and thus wint_t is a wider type, or wchar_t is sparse
> (as is the case with 32-bit wchar_t encoding Unicode), and the addition
> of WEOF to the set does not plug in the remaining sparse values; but
> using such values has unspecified results on any interface that takes a
> wint_t. WEOF only has to be distinct, it does not have to be negative.
>
> Don't think of it as 'wide-int', rather, think of it as 'the integral
> type that both contains wchar_t and WEOF'. You cannot write 'signed
> wint_t' nor 'unsigned 'wint_t'.
>
- Initial test code for \U, (continued)
- Initial test code for \U, John Kearney, 2012/02/21
- Here is a diff of all the changed to the unicode, John Kearney, 2012/02/21
- Re: Initial test code for \U, Chet Ramey, 2012/02/22
- Re: Initial test code for \U, Eric Blake, 2012/02/22
- Re: Initial test code for \U, John Kearney, 2012/02/26
- Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly., Linda Walsh, 2012/02/22
- Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly., Eric Blake, 2012/02/22
- Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly., John Kearney, 2012/02/22
- Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly., Linda Walsh, 2012/02/22
- Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly., Eric Blake, 2012/02/22
- Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly.,
John Kearney <=
- Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly., Eric Blake, 2012/02/22
- Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly., John Kearney, 2012/02/23
- Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly., Linda Walsh, 2012/02/23
- Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly., Eric Blake, 2012/02/23