Re: inputting characters by hexadigit

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: inputting characters by hexadigit

From:	Kenichi Handa
Subject:	Re: inputting characters by hexadigit
Date:	Sun, 20 Jul 2008 10:23:57 +0900

In article <address@hidden>, Juri Linkov <address@hidden> writes:

> > I think it is better to skip these ranges:
> >   #x3400..#x4dbf   -- CJK Ideograph Extension A
> >   #x4e00..#x9fff   -- CJK Ideograph
> >   #xd800..#xfaFF   -- surroage-pair, private use, CJK COMPATIBILITY 
> > IDEOGRAPH
> >   #x20000..#x2ffff -- CJK Ideograph Extension B
> > and end the loop at #xeffff (#xf0000.. are for private use)

> Actually there are no Unicode names in these ranges in UnicodeData.txt.
> It has only lines for the first and the last character in these ranges:

Yes.  But, for CJK chars:

   (get-char-code-property CHAR 'name)

returns a valid name something like "CJK IDEOGRAPH-3400"(*)
because get-char-code-property not only looks up
UnicodeData.txt but also compute a proper value if
necessary.

> If it would be possible to loop over names instead of loop over all
> characters to check for their names, then this code would be more fast,
> but I don't see how it would be possible to loop over all defined names
> in UnicodeData.txt.

> If this is not possible then we could optimize the loop over all
> characters in the chartable to skip these useless ranges.

I think it doesn't work because Hangul syllabic character
names must also be computed algorithmically(*).   I think
just doing somethink like this is good:

 (dotimes (c #xEFFFF)
    (unless (CHAR-IS-IN-A-RANGE-TO-SKIP-P c)
       ...))

(*): "The Unicode Standard 5.1" has this section.

4.8 Name—Normative
[...]
Ideographs and Hangul Syllables. Names for ideographs and
Hangul syllables are derived algorithmically. Unified CJK
ideographs are named CJK UNIFIED IDEOGRAPH-x, where x is
replaced with the hexadecimal Unicode code point—for
example, cjk unified ideograph-4E00. Similarly,
compatibility CJK ideographs are named “CJK COMPATIBILITY
IDEOGRAPH-x”. The names of Hangul syllables are generated as
described in “Hangul Syllable Names” in Section 3.12,
Conjoining Jamo Behavior.

---
Kenichi Handa
address@hidden

[Prev in Thread]

Current Thread

[Next in Thread]

Re: describe-char should display the character's Unicode name, (continued)

Prev by Date: Re: `Q' in Dired - be able to skip the rest of one file and move onto the next
Next by Date: Re: a review of the merge (Re: Emacs.app merged)
Previous by thread: Re: inputting characters by hexadigit
Next by thread: Re: inputting characters by hexadigit
Index(es):
- Date
- Thread