[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: bidi properties from uniprop tables
From: |
Eli Zaretskii |
Subject: |
Re: bidi properties from uniprop tables |
Date: |
Fri, 19 Aug 2011 13:36:22 +0300 |
> From: "Stephen J. Turnbull" <address@hidden>
> Cc: address@hidden,
> address@hidden
> Date: Fri, 19 Aug 2011 18:15:58 +0900
>
> Unassigned characters are given strong types in the
> algorithm. This is an explicit exception to the general Unicode
> conformance requirements with respect to unassigned characters. As
> characters become assigned in the future, these bidirectional
> types may change. For assignments to character types, see
> DerivedBidiClass.txt [DerivedBIDI] in the [UCD].
Thanks, I've managed to miss that addition to the UBA.
> Since Bidi_Class is only used in this algorithm (and explicit property
> lookups) AFAIK
That's not true, it is also used in regexp search by category. So we
should decide whether to assign these types in the uniprop table, or
have a fallback for them in bidi.c. Any opinions? Handa-san?
> it seems reasonable to me that get-char-code-property
> et amis should return the "strong type" specified by DerivedBIDI
> (which is LTR it seems, but you should check that).
No, the type depends on the block:
# Unlike other properties, unassigned code points in blocks
# reserved for right-to-left scripts are given either types R or AL.
#
# The unassigned code points that default to AL are in the ranges:
# [\u0600-\u07BF \uFB50-\uFDFF \uFE70-\uFEFF]
#
# Arabic: U+0600 - U+06FF
# Syriac: U+0700 - U+074F
# Arabic_Supplement: U+0750 - U+077F
# Thaana: U+0780 - U+07BF
# Arabic_Presentation_Forms_A:
# U+FB50 - U+FDFF
# Arabic_Presentation_Forms_B:
# U+FE70 - U+FEFF
# minus noncharacter code points.
#
# The unassigned code points that default to R are in the ranges:
# [\u0590-\u05FF \u07C0-\u08FF \uFB1D-\uFB4F \U00010800-\U00010FFF
\U0001E800-\U0001EFFF]
#
# Hebrew: U+0590 - U+05FF
# NKo: U+07C0 - U+07FF
# Cypriot_Syllabary: U+10800 - U+1083F
# Phoenician: U+10900 - U+1091F
# Lydian: U+10920 - U+1093F
# Kharoshthi: U+10A00 - U+10A5F
# and any others in the ranges:
# U+0800 - U+08FF,
# U+FB1D - U+FB4F,
# U+10840 - U+10FFF,
# U+1E800 - U+1EFFF
#
# For all other cases:
# All code points not explicitly listed for Bidi_Class
# have the value Left_To_Right (L).
Re: bidi properties from uniprop tables, Kenichi Handa, 2011/08/23