[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Octave-bug-tracker] [bug #63139] About string support for Chinese, Japa
From: |
Arun Giridhar |
Subject: |
[Octave-bug-tracker] [bug #63139] About string support for Chinese, Japanese and Korean characters |
Date: |
Fri, 30 Sep 2022 08:57:52 -0400 (EDT) |
Follow-up Comment #1, bug #63139 (project octave):
This is not CJK-specific but a different internal representation of Unicode
text. AFAIK Octave uses UTF-8, meaning that a string of Unicode characters
becomes a byte stream. Each of your two Chinese characters is represented in
24 bits of UTF-8 of which 16 bits are content and 8 bits are preset values as
described here: https://en.wikipedia.org/wiki/UTF-8#Encoding
You can verify the encoding is correct UTF-8 with these commands:
>> foo = dec2bin ('你' + 0, 8)'(:)'
foo = 111001001011110110100000
>> foo = dec2bin ('好' + 0, 8)'(:)'
foo = 111001011010010110111101
>> foo = dec2bin ('你好' + 0, 8)'(:)'
foo = 111001001011110110100000111001011010010110111101
>> foo = '你好'
>> whos foo
Variables visible from the current scope:
variables in scope: top scope
Attr Name Size Bytes Class
==== ==== ==== ===== =====
foo 1x6 6 char
What does Matlab return for those commands? That will tell you what encoding
is used internally.
As far as I can see, this is not a bug, unless you get wrong results as a
consequence of different Unicode representations.
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?63139>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/