emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bidi properties from uniprop tables


From: Kenichi Handa
Subject: Re: bidi properties from uniprop tables
Date: Sat, 20 Aug 2011 21:42:20 +0900

In article <address@hidden>, Eli Zaretskii <address@hidden> writes:

> > Since Bidi_Class is only used in this algorithm (and explicit property
> > lookups) AFAIK

> That's not true, it is also used in regexp search by category.  So we
> should decide whether to assign these types in the uniprop table, or
> have a fallback for them in bidi.c.  Any opinions?  Handa-san?

As I'm on vacation now, I can't access the source of Emacs,
but I remember that there's a place in an element of
unidata-SOMETHING-alist (I don't remember what SOMETHING is)
to specify the default property value.  So, it should be
easy to fix the default value if it is a simple one.

But, the current code doesn't handle the non-simple default
value as below.

> > it seems reasonable to me that get-char-code-property
> > et amis should return the "strong type" specified by DerivedBIDI
> > (which is LTR it seems, but you should check that).

> No, the type depends on the block:

>   # Unlike other properties, unassigned code points in blocks
>   # reserved for right-to-left scripts are given either types R or AL.
>   #
>   # The unassigned code points that default to AL are in the ranges:
>   #     [\u0600-\u07BF \uFB50-\uFDFF \uFE70-\uFEFF]
>   #
>   #     Arabic:            U+0600  -  U+06FF
>   #     Syriac:            U+0700  -  U+074F
>   #     Arabic_Supplement: U+0750  -  U+077F
>   #     Thaana:            U+0780  -  U+07BF
>   #     Arabic_Presentation_Forms_A:
>   #                        U+FB50  -  U+FDFF
>   #     Arabic_Presentation_Forms_B:
>   #                        U+FE70  -  U+FEFF
>   #           minus noncharacter code points.
>   #
>   # The unassigned code points that default to R are in the ranges:
>   #     [\u0590-\u05FF \u07C0-\u08FF \uFB1D-\uFB4F \U00010800-\U00010FFF 
> \U0001E800-\U0001EFFF]
>   #
>   #     Hebrew:            U+0590  -  U+05FF
>   #     NKo:               U+07C0  -  U+07FF
>   #     Cypriot_Syllabary: U+10800 - U+1083F
>   #     Phoenician:        U+10900 - U+1091F
>   #     Lydian:            U+10920 - U+1093F
>   #     Kharoshthi:        U+10A00 - U+10A5F
>   #     and any others in the ranges:
>   #                        U+0800  -  U+08FF,
>   #                        U+FB1D  -  U+FB4F,
>   #                        U+10840 - U+10FFF,
>   #                        U+1E800 - U+1EFFF
>   #
>   # For all other cases:

>   #  All code points not explicitly listed for Bidi_Class
>   #  have the value Left_To_Right (L).

I'll fix the code to handle it when I'm back to work on next
Monday.

---
Kenichi Handa
address@hidden



reply via email to

[Prev in Thread] Current Thread [Next in Thread]