emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Composing Hebrew diacriticals


From: Yair F
Subject: Re: Composing Hebrew diacriticals
Date: Fri, 7 May 2010 13:00:16 +0300

On Fri, May 7, 2010 at 9:23 AM, Kenichi Handa <address@hidden> wrote:

> If what composed are only diacritical marks, and they are
> placed on any base characters, it is better to set that kind
> of list only for hebrew diacriticals for efficiency.  So,
> the code will be something like this:
>
> (let ((hebrew-diacritals-list '((FROM1 . TO1) (FROM2 . TO2) ...))
>      (regexp "[..HEBREW_BASE_CHARS..][..HEBREW_DIACRITICALS..]))
>  (dolist (elt hebrew-diacritals-list)
>    (set-char-table-range elt
>      (list (vector regexp 1 'font-shape-gstring)))))
>
> Here "1" is for moving back one character to check matching
> with REGEXP.
>
>>> There are some restrictions on which characters are allowed to be composed.
>
> If that restrictions are more rigid, regexp should vary for
> each diacritical mark.

This is the composition regexp : I added whitespace and comments for readability

\\(
[\u05D0-\u05D4\u05D6-\u05E8\u05EA\u05F1-\u05F3] ;; base
  [\u05BC\u05BF]?                               ;; 0-1 marks of 1st
class (dagesh)
  [\u05B0-\u05B9\u05BB\u05C7]?                  ;; 0-1 marks of 3rd
class (niqud)
  [\u0591-\u05AF\u05BD]*                        ;; 0-2 (possibly 3)
marks of 4th class
\\|
\u05D5                                          ; base
  \u05BC?                                       ;; 0-1 marks of 1st
class (dagesh)
[\u05B0-\u05BB\u05C7]?                          ;; 0-1 marks of
extended 3rd class (niqud)
[\u0591-\u05AF\u05BD]*                          ;; 0-2 (possibly 3)
marks of 4th class
\\|
\u05E9                                          ; base
  \u05BC                                       ;; 0-1 marks of 1st
class (dagesh)
  [\u05C1\u05C2]?                              ;; 0-1 marks of 2nd
class (shin dot)
  [\u05B0-\u05B9\u05BB\u05C7]?                  ;; 0-1 marks of 3rd
class (niqud)
  [\u0591-\u05AF\u05BD]*                        ;; 0-2 (possibly 3)
marks of 4th class
\\)

What would be the best way in this case?
In the most extreme case there are 6 marks attached to base character.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]