[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: From wchar_t to char32_t
|
From: |
Bruno Haible |
|
Subject: |
Re: From wchar_t to char32_t |
|
Date: |
Wed, 12 Jul 2023 00:32:42 +0200 |
Paul Eggert wrote:
> > * The locale encoding is BIG5-HKSCS, e.g. on a glibc system the
> > zh_HK.BIG5-HKSCS the locale.
> >
> > * The input is one of the 4 characters in that encoding that map to
> > a sequence of two Unicode characters:
> >
> > input maps to
> > ----- -------
> > 0x88 0x62 U+00CA U+0304
> > 0x88 0x64 U+00CA U+030C
> > 0x88 0xA3 U+00EA U+0304
> > 0x88 0xA5 U+00EA U+030C > ...
>
> I looked into this some more and unfortunately don't understand the
> above. Could you explain a bit more?
My statement is about BIG5-HKSCS.
> <http://www.nits.org.cn/index/article/4034> says that the official
> mapping table for GB 18030-2022 and BMP is here:
>
> http://www.nits.org.cn/cmsfile/download/134
>
> and this contains the following (nonconsecutive) lines:
>
> 5746 8862
> 5749 8864
> 57BC 88A3
> 57BE 88A5
>
> which, if I understand things correctly, means the four two-byte
> sequences that you mention should convert to the following four Unicode
> characters:
>
> 坆 U+5746 CJK IDEOGRAPH-5746
> 坉 U+5749 CJK IDEOGRAPH-5749
> 垼 U+57BC CJK IDEOGRAPH-57BC
> 垾 U+57BE CJK IDEOGRAPH-57BE
>
> without mbrtoc23 having to return (size_t) -3.
>
> Perhaps there was a problem with an earlier version of GB 18030 that has
> been fixed in the 2022 edition?
You are looking at GB18030. GB18030 and BIG5-HKSCS are completely
unrelated.
References:
https://www.haible.de/bruno/charsets/conversion-tables/Chinese.html
https://en.wikipedia.org/wiki/Big5#HKSCS
https://en.wikipedia.org/wiki/GB_18030
Bruno
- Re: From wchar_t to char32_t, (continued)
- Re: From wchar_t to char32_t, Bruno Haible, 2023/07/13
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/13
- Re: From wchar_t to char32_t, new module mbszero, Bruno Haible, 2023/07/16
- Re: From wchar_t to char32_t, new module mbszero, Paul Eggert, 2023/07/16
- Re: From wchar_t to char32_t, new module mbszero, Bruno Haible, 2023/07/17
- Re: From wchar_t to char32_t, new module mbszero, Paul Eggert, 2023/07/18
- Re: From wchar_t to char32_t, new module mbszero, Bruno Haible, 2023/07/19
- Re: From wchar_t to char32_t, new module mbszero, Bruno Haible, 2023/07/17
- Re: From wchar_t to char32_t, Bruno Haible, 2023/07/04
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/11
- Re: From wchar_t to char32_t,
Bruno Haible <=
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/11
Re: From wchar_t to char32_t, Bruno Haible, 2023/07/02
Re: From wchar_t to char32_t, Bruno Haible, 2023/07/02
Re: From wchar_t to char32_t, Bruno Haible, 2023/07/12
Re: From wchar_t to char32_t, Bruno Haible, 2023/07/13