[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: inputting characters by hexadigit
From: |
Kenichi Handa |
Subject: |
Re: inputting characters by hexadigit |
Date: |
Sun, 20 Jul 2008 10:23:57 +0900 |
In article <address@hidden>, Juri Linkov <address@hidden> writes:
> > I think it is better to skip these ranges:
> > #x3400..#x4dbf -- CJK Ideograph Extension A
> > #x4e00..#x9fff -- CJK Ideograph
> > #xd800..#xfaFF -- surroage-pair, private use, CJK COMPATIBILITY
> > IDEOGRAPH
> > #x20000..#x2ffff -- CJK Ideograph Extension B
> > and end the loop at #xeffff (#xf0000.. are for private use)
> Actually there are no Unicode names in these ranges in UnicodeData.txt.
> It has only lines for the first and the last character in these ranges:
Yes. But, for CJK chars:
(get-char-code-property CHAR 'name)
returns a valid name something like "CJK IDEOGRAPH-3400"(*)
because get-char-code-property not only looks up
UnicodeData.txt but also compute a proper value if
necessary.
> If it would be possible to loop over names instead of loop over all
> characters to check for their names, then this code would be more fast,
> but I don't see how it would be possible to loop over all defined names
> in UnicodeData.txt.
> If this is not possible then we could optimize the loop over all
> characters in the chartable to skip these useless ranges.
I think it doesn't work because Hangul syllabic character
names must also be computed algorithmically(*). I think
just doing somethink like this is good:
(dotimes (c #xEFFFF)
(unless (CHAR-IS-IN-A-RANGE-TO-SKIP-P c)
...))
(*): "The Unicode Standard 5.1" has this section.
4.8 Name—Normative
[...]
Ideographs and Hangul Syllables. Names for ideographs and
Hangul syllables are derived algorithmically. Unified CJK
ideographs are named CJK UNIFIED IDEOGRAPH-x, where x is
replaced with the hexadecimal Unicode code point—for
example, cjk unified ideograph-4E00. Similarly,
compatibility CJK ideographs are named “CJK COMPATIBILITY
IDEOGRAPH-x”. The names of Hangul syllables are generated as
described in “Hangul Syllable Names” in Section 3.12,
Conjoining Jamo Behavior.
---
Kenichi Handa
address@hidden
- Re: describe-char should display the character's Unicode name, (continued)
- Re: describe-char should display the character's Unicode name, Lennart Borgman (gmail), 2008/07/23
- Re: describe-char should display the character's Unicode name, Ted Zlatanov, 2008/07/24
- Re: describe-char should display the character's Unicode name, Lennart Borgman, 2008/07/24
- Re: describe-char should display the character's Unicode name, Ted Zlatanov, 2008/07/24
- Re: describe-char should display the character's Unicode name, Lennart Borgman (gmail), 2008/07/24
- Re: describe-char should display the character's Unicode name, Juri Linkov, 2008/07/24
- Re: describe-char should display the character's Unicode name, Kenichi Handa, 2008/07/24
- Re: describe-char should display the character's Unicode name, Juri Linkov, 2008/07/18
- Re: describe-char should display the character's Unicode name, Kenichi Handa, 2008/07/18
- Re: inputting characters by hexadigit, Juri Linkov, 2008/07/19
- Re: inputting characters by hexadigit,
Kenichi Handa <=
- Re: inputting characters by hexadigit, Juri Linkov, 2008/07/20
- Re: inputting characters by hexadigit, Ted Zlatanov, 2008/07/23
- Re: inputting characters by hexadigit, Stefan Monnier, 2008/07/23
- Re: inputting characters by hexadigit, Ted Zlatanov, 2008/07/23
- Re: inputting characters by hexadigit, Stefan Monnier, 2008/07/23
- Re: inputting characters by hexadigit, Ted Zlatanov, 2008/07/24
- Re: inputting characters by hexadigit, Lennart Borgman, 2008/07/24
- Re: inputting characters by hexadigit, Stefan Monnier, 2008/07/24
- Re: inputting characters by hexadigit, Ted Zlatanov, 2008/07/24
- Re: inputting characters by hexadigit, Stefan Monnier, 2008/07/24