[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[emacs-bidi] Embedding levels of formatting codes
From: |
Eli Zaretskii |
Subject: |
[emacs-bidi] Embedding levels of formatting codes |
Date: |
Tue, 16 Oct 2001 13:30:33 +0200 (IST) |
Not a matter of great importance, but...
UTR#9 doesn't say much about embedding levels that should be assigned
to the formatting codes (LRE, LRO, RLE, RLO, and PDF). The mainline
algorithm simply removes them after computing the embedding levels and
the override direction of each level run:
X9. Remove all RLE, LRE, RLO, LRO, PDF, and BN codes.
Note that an implementation does not have to actually remove the codes, it
just has to behave as
though the codes were not present for the remainder of the algorithm.
Conformance does not
require any particular placement of these codes as long as all other
characters are ordered
correctly.
See 'Implementation Notes' later in this section for information on
implementing the algorithm
without removing the formatting codes.
X10. The remaining rules are applied to each run of characters at the
same level. For each run, determine the start-of-level-run
(sor) and end-of-level-run (eor) type, either L or R. This depends on the
higher of the two levels on either side of the boundary (at
the start or end of the paragraph, the level of the 'other' run is the
base embedding level). If the higher level is odd, the type is R,
otherwise it is L.
Clearly, if the codes are removed, it doesn't matter much what levels
are assigned to them. However, the current design for Emacs is to
leave the codes in the buffer, so it does matter for us (I give below
an example of a situation when it matters).
In the short and not very clearly written section of UTR#9 where it
gives recommendations for when the codes are not removed, it says
this:
* In rule X9, instead of removing the format codes, assign the embedding
level to each embedding character, and turn it into BN.
* In rule X10, assign L or R to the last of a sequence of adjacent BNs
according to the eor / sor, and set the level to the higher of the
two levels.
Now, what in the world do they mean by that?? Does anyone have a clue
and can explain this in small words?
Why does it matter? For example, consider the buffer whose contents
is this:
here is a small {RLO}TEst{PDF}
which should be displayed as
here is a small tsET
If the user asks to see the formatting codes, do we want them to be
displayed like this:
here is a small {RLO}tsET{PDF}
or like this:
here is a small {PDF}tsET{RLO}
?
My current design would produce the second variant, because both RLO
and PDF are assigned the same embedding level as the characters they
embed. That is, the formatting codes are treated as if they were part
of the embedded text. Is that okay?
Note that a slightly more complex example, like this:
a {LRE}simple {RLO}teST{PDF} which{PDF} see
will be displayed with the formatting codes like this:
a {LRE}simple {PDF}TSet{RLO} which{PDF} see
Does that look good enough?
Re: [emacs-bidi] Embedding levels of formatting codes, Roozbeh Pournader, 2001/10/16