emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Compositions and bidi display


From: Kenichi Handa
Subject: Re: Compositions and bidi display
Date: Fri, 30 Apr 2010 21:12:04 +0900

I'll reply to this before replying to your previous mail.

In article <address@hidden>, Eli Zaretskii <address@hidden> writes:

> > Note that composition_compute_stop_pos just finds a stop
> > position to check, and the actual checking and composing is
> > done by composition_reseat_it which is called by
> > CHAR_COMPOSED_P.

> But it looks like composition_compute_stop_pos does use at least some
> validation for the candidate stop position.  AFAIU, this fragment
> finds and validates a static composition:

>   if (find_composition (charpos, endpos, &start, &end, &prop, string)
>       && COMPOSITION_VALID_P (start, end, prop))
>     {
>       cmp_it->stop_pos = endpos = start;
>       cmp_it->ch = -1;
>     }

> So it looks like COMPOSITION_VALID_P is the proper way of validating a
> position that is a candidate for a static composition.  Is that true?

Yes.

> If it is true, then the end point of the static composition is given
> by the `end' argument to find_composition,

Yes.

> and all we need is record it in cmp_it.

Record it for what purpose?

Anyway, calling COMPOSITION_VALID_P here is because we can
avoid calling it again in composition_reseat_it.  But, for
automatic composition, the checking and actual composing
happens at the same time.  So, even if we do that in
composition_compute_stop_pos, composition_reseat_it has to
do that again (for actual composing).

> And the loop after that, conditioned on auto-composition-mode, seems
> to do a similar job for automatic compositions.  Omitting some
> secondary details, that loop does this:

>   while (charpos < endpos)
>     {
>       [advance to the next character]
>       val = CHAR_TABLE_REF (Vcomposition_function_table, c);
>       if (! NILP (val))
>       {
>         Lisp_Object elt;

>         for (; CONSP (val); val = XCDR (val))
>           {
>             elt = XCAR (val);
>             if (VECTORP (elt) && ASIZE (elt) == 3 && NATNUMP (AREF (elt, 1))
>                 && charpos - 1 - XFASTINT (AREF (elt, 1)) >= start)
>               break;
>           }
>         if (CONSP (val))
>           {
>             cmp_it->lookback = XFASTINT (AREF (elt, 1));
>             cmp_it->stop_pos = charpos - 1 - cmp_it->lookback;
>             cmp_it->ch = c;
>             return;
>           }
>       }
>     }

> This looks as if a position that is a candidate for starting a
> composition sequence should have a non-nil entry in
> composition-function-table for the character at that position, and
> that entry should specify the (relative) character position where the
> sequence might start.  Is my understanding correct?

Mostly, but not accuate.  The correct one is "A position
that will be composed with the following and/or the
preceding characters should have a non-nil entry in ...".

The reason why we don't record all characters that will
start a composition is for efficiency (for instance, to
record only combining characters (U+0300...U+03FF) in
composition-function-table).

> > To move from one composition position to the next, we must actually
> > call autocmp_chars and find where the current composition ends, then
> > start searching for the next composition.

> It is true that the code looking for stop position that might begin an
> automatic composition does not compute the end of the sequence.  That
> end is computed by autocmp_chars.  But what does this mean in
> practice?  Suppose we have found a candidate stop_pos, marked by S
> below:

>      abcdeSuvwxyz

> First, a composition sequence cannot be shorter than 2 characters,
> right?

No, a single character can composed.

> So the next stop_pos cannot be before v.  Now suppose that the
> actual composition sequence is "Suvw", and we issue the next call to
> composition_compute_stop_pos at v -- are you saying that it will
> suggest that v is also a possible stop_pos, even though it is in the
> middle of a composition sequence?  --- (Q1)

Yes, that happens in Indic scripts.  Actually both a line
starting with "Suvw" and a line staring with "vw" can have
different composition at BOL.  But, AFAIK, all R2L scripts
(Arabic, Dhivehi, Hebrew) don't have such a charactics.  So,
in a adhoc way, we can say that your (Q1) is false.  So, 

> If not, then repeated calls to
> composition_compute_stop_pos in the bidi case, without calling
> composition_reseat_it in between, will just be slightly
> more expensive because they will need to examine more positions.  Is
> this analysis correct?

it is correct but just empirically.  There will be a script
that uses the same writing system as Devanagari but in R2L
manner somewhere between Indic and Arabic region.  I have no
idea.

> > But composition_reseat_it also needs ENDPOS

> We can use IT_CHARPOS + MAX_COMPOSITION_COMPONENTS as ENDPOS, if we
> call composition_reseat_it and composition_compute_stop_pos in the
> forward direction repeatedly, can't we?  That's because, when the
> iterator is some position, we are only interested in compositions that
> cover that position.

No.  Such a way slows down the display of a buffer that has
no composition at all.  For such a buffer,
composition_compute_stop_pos should set cmp_it->stop_pos to
the actual endpos so that CHAR_COMPOSED_P quickly returns
zero.

> > We don't have to re-calculate ENDPOS each time.  It must be
> > updated only when we pass over bidi boundary.

> Btw, can we always assume that all the characters of a composition
> sequence are at the same embedding level?  I guess IOW I'm asking what
> Emacs features are currently implemented based on compositions?

Yes.  I can't think of any situation that characters must be
composed striding over bidi-boundary.   First of all, in
what embedding level, such a composition belongs?

> Obviously, all the characters in a sequence that produces a single
> grapheme must have the same level, but what about compositions that
> produce several grapheme clusters -- can each of the clusters have
> different bidirectional properties?

It is possible to setup a regular expression of an entry of
composition-function-table to do such a composition.  But, I
think we don't have to support such a thing until we face
with a concrete example of the necessity (quite doubtfull).

---
Kenichi Handa
address@hidden




reply via email to

[Prev in Thread] Current Thread [Next in Thread]