emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Compositions and bidi display


From: Kenichi Handa
Subject: Re: Compositions and bidi display
Date: Tue, 27 Apr 2010 21:15:04 +0900

In article <address@hidden>, Eli Zaretskii <address@hidden> writes:

> > So, the bidi reordering must happen after composition handling is
> > done.

> Unfortunately, this is impossible, not without throwing away the
> entire design and current implementation of the bidi reordering, and
> implementing it in a totally different way that will have to be much
> more invasive into the overall design of Emacs display engine.

> The reason is, as you know, that bidi reordering in Emacs is
> conceptually just a replacement for advancing from one character to
> the next during iteration through buffers or strings.  Instead of
> incrementing the character position to the next character, we modify
> the position non-linearly to get to the next character in the visual
> order.  Obviously, this iteration is a lower-level operation than character
> composition.

> In addition, the bidi reordering engine knows nothing about the
> characters it encounters except their bidirectional properties; in
> particular, it doesn't know anything about character compositions, and
> teaching it about them would mean rather serious complications.

> Moreover, the bidirectional properties are in general defined for
> individual characters, not for the composed ones, which is one more
> reason it is very hard to do what you suggest, even if we would turn
> the current design inside out.  For example, we compose Hebrew
> consonants with diacriticals into a single glyph, but that glyph has
> no character codepoint to look up its bidirectional properties in the
> Unicode database.

I think it's possible to apply Unicode's bidi algorithm to
the glyph sequence if each glyph provides a character code
to check for reordering.  For composition glyph, we can use
the first character of the composed sequence.  But, as your
algorithm is incremental and don't cache glyphs, such a
method may slow down the display engine.

> So, once composed, these characters cannot be
> reordered by following the UAX#9 algorithm without complications,
> because UAX#9 is explicitly defined to work _before_ any shaping of
> characters for display, see Section 3.5 there.

The example of Section 3.5 is for base characters, not
applicable for base and combining character sequence.  First
of all, TR9's bidi model is not incremental, and thus the
shaping engine can see a result of all reordering result at
once.

In that model, it's possible for the shaping engine to
reverse the order of a base character and combining
characters after bidi processing as written in L3 of 3.4:

============================================================
L3. Combining marks applied to a right-to-left base
character will at this point precede their base
character. If the rendering engine expects them to follow
the base characters in the final display process, then the
ordering of the marks and the base character must be
reversed.
============================================================

So, how to do that in the current incremental method?

> Therefore, I will need to find and handle sequences of characters to
> be composed as an integral part of next_element_from_buffer, similarly
> to what is already done with face changes there.

> The idea is to detect the situation where the bidi iteration placed us
> into a composable sequence of characters, and when that happens,
> compose them and deliver them as a single display element, and then
> skip the entire sequence, like we do today in the unidirectional
> display.  The tricky part is that today we only detect this when we
> hit the beginning of such a sequence, while moving in the strictly
> increasing order of buffer positions; with bidi reordering we will
> need to detect them from the end of the sequence as well, for when the
> bidi iterator moves backwards or jumps across many character
> positions.

> Is it possible to write a function or macro that will find out, for a
> particular buffer/string position, whether that position is at the end
> or in the middle of a composable sequence of characters, and if so,
> return the character positions of the first and last characters of the
> sequence?  Something like CHAR_COMPOSED_P, but one that looks back in
> the buffer?  If so, could you please help me write such a function?

Here's a rough idea.

(1) Call composition_compute_stop_pos with ENDPOS < CHARPOS
if we are now in R2L range.  ENDPOS is the start of this R2L
range.  And modify this function to search a buffer/string
backward if ENDPOS < CHARPOS.

Provided that uppercase letters denote Hebrew consonants,
lowercase denotes Hebrew diacriticals, a buffer has the
character sequence "AaBbCc", CHARPOS is the position of 'c',
ENDPOS is the position of 'A'.

(2) Do the same for composition_reseat_it.

(3) Add member 'direction' to struct composition_it that
records in which direction context the composition was made.

(4) Modify composition_update_it to update members 'from'
and 'to' of "struct composition_it" in the reverse order if
'direction' is R2L.  Note that a single composition may
contain multiple graphem clusters.  For instance, it's
possible to write a composition fuction that accepts
"AaBbCc" (above example) at onse and produce a single
composition that contains three graphem clusters "Aa", "Bb",
and "Cc".

To do all of them, perhaps all I need is to know the way to
find the correct ENDPOS.  Please tell me how to do that.

---
Kenichi Handa
address@hidden




reply via email to

[Prev in Thread] Current Thread [Next in Thread]