[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: 16-bit wchar_t on Windows and Cygwin
From: |
Eric Blake |
Subject: |
Re: 16-bit wchar_t on Windows and Cygwin |
Date: |
Wed, 02 Feb 2011 16:19:40 -0700 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101209 Fedora/3.1.7-0.35.b3pre.fc14 Lightning/1.0b3pre Mnenhy/0.8.3 Thunderbird/3.1.7 |
On 02/02/2011 04:03 PM, Bruno Haible wrote:
>> Are you thinking of making a sane wrapping around either 4-byte wchar_t
>> or which maps to 2-byte wchar_t but sanely handles UTF-16 (which makes
>> it a thin wrapper on both Linux and Cygwin, but needing more work on
>> mingw), or are you thinking that it is always a 4-byte type (needing
>> lots more memory manipulation on cygwin to convert between 2- and 4-byte
>> representations when using cygwin's functions, or else reimplementing
>> everything from scratch by completely bypassing cygwin)?
>
> I'm not sure I understand your question. The plan is that
>
> - On platforms with a 32-bit wchar_t, like glibc, *BSD, and many others,
> 'wwchar_t' is identical to 'wchar_t', and the function wrappers are
> simple redirections.
>
> - On Cygwin and mingw, wwchar_t is 'uint32_t' (so as to accommodate
> all Unicode characters and WEOF and so that it plays well with 'wint_t').
> mbrtowwc is implemented by 1 or 2 calls to mbrtowc. mbsrtowwcs may be
> implemented by a call to mbsrtowcs and an additional conversion loop,
> or it might be implemented on top of mbrtowwc; that's merely a speed
> vs. memory trade-off.
> The plan is not to "completely bypassing cygwin", but to use as much
> of Cygwin's built-ins as makes sense.
You answered my question in spite of myself. I was asking:
should wwchar_t (or xwchar_t, but not xchar_t) be 2-bytes on cygwin, but
unlike the POSIX definition of wchar_t being always 1 character per
unit, the new type is explicitly documented as being multi-unit on some
platforms but with sane semantics
or should it always be 4-bytes, where conversion from wchar_t to
wwchar_t requires some efforts, and where the new type must be used
everywhere (which means wrapping a lot of APIs), but where you can once
again assume POSIX semantics of 1 character per unit, simplifying life
of callers at the expense of converting to the new type
And on asking the question in those more detailed words, I agree with
your conclusion - on cygwin, wwchar_t should be 4 bytes.
>
> - On platforms with a 16-bit wchar_t but where the wchar_t[] encoding
> in Unicode locales is merely UCS-2, like AIX, use the no-op thin
> wrappers as well. If the platform does not support more than the BMP,
> it makes not much sense for GNU programs to try to work around that.
Agreed.
Next question/thought. Gnulib should definitely tackle this first. But
if it works out, should we also add wwchar_t natively into cygwin? It
would certainly be easier to add new interfaces incrementally, in
preparation for a possible future ABI conversion to make wchar_t become
4 bytes.
--
Eric Blake address@hidden +1-801-349-2682
Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature
Re: bug#7948: 16-bit wchar_t on Windows and Cygwin, Paul Eggert, 2011/02/02
Re: 16-bit wchar_t on Windows and Cygwin, Eric Blake, 2011/02/02
Re: 16-bit wchar_t on Windows and Cygwin, Corinna Vinschen, 2011/02/02