emacs-bidi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[emacs-bidi] Embedding levels of formatting codes


From: Eli Zaretskii
Subject: [emacs-bidi] Embedding levels of formatting codes
Date: Tue, 16 Oct 2001 13:30:33 +0200 (IST)

Not a matter of great importance, but...

UTR#9 doesn't say much about embedding levels that should be assigned
to the formatting codes (LRE, LRO, RLE, RLO, and PDF).  The mainline
algorithm simply removes them after computing the embedding levels and
the override direction of each level run:

  X9. Remove all RLE, LRE, RLO, LRO, PDF, and BN codes.

     Note that an implementation does not have to actually remove the codes, it 
just has to behave as
     though the codes were not present for the remainder of the algorithm. 
Conformance does not
     require any particular placement of these codes as long as all other 
characters are ordered
     correctly.
     See 'Implementation Notes' later in this section for information on 
implementing the algorithm
     without removing the formatting codes.

  X10. The remaining rules are applied to each run of characters at the 
  same level. For each run, determine the start-of-level-run
  (sor) and end-of-level-run (eor) type, either L or R. This depends on the 
  higher of the two levels on either side of the boundary (at
  the start or end of the paragraph, the level of the 'other' run is the 
  base embedding level). If the higher level is odd, the type is R,
  otherwise it is L.

Clearly, if the codes are removed, it doesn't matter much what levels
are assigned to them.  However, the current design for Emacs is to
leave the codes in the buffer, so it does matter for us (I give below
an example of a situation when it matters).

In the short and not very clearly written section of UTR#9 where it
gives recommendations for when the codes are not removed, it says
this:

  * In rule X9, instead of removing the format codes, assign the embedding 
    level to each embedding character, and turn it into BN. 
  * In rule X10, assign L or R to the last of a sequence of adjacent BNs 
    according to the eor / sor, and set the level to the higher of the
    two levels. 

Now, what in the world do they mean by that??  Does anyone have a clue
and can explain this in small words?

Why does it matter?  For example, consider the buffer whose contents
is this:

  here is a small {RLO}TEst{PDF}

which should be displayed as

  here is a small tsET

If the user asks to see the formatting codes, do we want them to be
displayed like this:

  here is a small {RLO}tsET{PDF}

or like this:

  here is a small {PDF}tsET{RLO}

?

My current design would produce the second variant, because both RLO
and PDF are assigned the same embedding level as the characters they
embed.  That is, the formatting codes are treated as if they were part
of the embedded text.  Is that okay?

Note that a slightly more complex example, like this:

  a {LRE}simple {RLO}teST{PDF} which{PDF} see

will be displayed with the formatting codes like this:

  a {LRE}simple {PDF}TSet{RLO} which{PDF} see

Does that look good enough?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]