emacs-bidi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [emacs-bidi] Embedding levels of formatting codes


From: Behdad Esfahbod
Subject: Re: [emacs-bidi] Embedding levels of formatting codes
Date: Tue, 16 Oct 2001 16:02:23 +0330 (IRT)

On Tue, 16 Oct 2001, Eli Zaretskii wrote:
> Not a matter of great importance, but...
> 
> UTR#9 doesn't say much about embedding levels that should be assigned
> to the formatting codes (LRE, LRO, RLE, RLO, and PDF).  The mainline
> algorithm simply removes them after computing the embedding levels and
> the override direction of each level run:
> 
>   X9. Remove all RLE, LRE, RLO, LRO, PDF, and BN codes.
> 
>      Note that an implementation does not have to actually remove the codes, 
> it just has to behave as
>      though the codes were not present for the remainder of the algorithm. 
> Conformance does not
>      require any particular placement of these codes as long as all other 
> characters are ordered
>      correctly.
>      See 'Implementation Notes' later in this section for information on 
> implementing the algorithm
>      without removing the formatting codes.
> 
>   X10. The remaining rules are applied to each run of characters at the 
>   same level. For each run, determine the start-of-level-run
>   (sor) and end-of-level-run (eor) type, either L or R. This depends on the 
>   higher of the two levels on either side of the boundary (at
>   the start or end of the paragraph, the level of the 'other' run is the 
>   base embedding level). If the higher level is odd, the type is R,
>   otherwise it is L.
> 
> Clearly, if the codes are removed, it doesn't matter much what levels
> are assigned to them.  However, the current design for Emacs is to
> leave the codes in the buffer, so it does matter for us (I give below
> an example of a situation when it matters).

fribidi does not physically remove them too, but I first implemented 
the algorithm described in Implementaion Notes of UAX#9, and when I 
found a test that the output was wrong for it, Roozbeh reported this 
to them, and they fixed the Implemention Notes in 3.0.1, and after 
that, I implemented a nice algorithm in fribidi, that I think everyone 
can use too, it's I remove them logically, I mean from the runs chain, 
and then insert them after resolving levels, then assign a level to 
them, this part is what that you are interested in.

> In the short and not very clearly written section of UTR#9 where it
> gives recommendations for when the codes are not removed, it says
> this:
> 
>   * In rule X9, instead of removing the format codes, assign the embedding 
>     level to each embedding character, and turn it into BN. 
>   * In rule X10, assign L or R to the last of a sequence of adjacent BNs 
>     according to the eor / sor, and set the level to the higher of the
>     two levels. 
> 
> Now, what in the world do they mean by that??  Does anyone have a clue
> and can explain this in small words?
> 
> Why does it matter?  For example, consider the buffer whose contents
> is this:
> 
>   here is a small {RLO}TEst{PDF}
> 
> which should be displayed as
> 
>   here is a small tsET
> 
> If the user asks to see the formatting codes, do we want them to be
> displayed like this:
> 
>   here is a small {RLO}tsET{PDF}
> 
> or like this:
> 
>   here is a small {PDF}tsET{RLO}
> 
> ?
> 
> My current design would produce the second variant, because both RLO
> and PDF are assigned the same embedding level as the characters they
> embed.  That is, the formatting codes are treated as if they were part
> of the embedded text.  Is that okay?

It is a simple and nice way to handle them, fribidi currently does 
something like this, but it can cause problems if you implement it 
without enough care, consider this test:

  AN ARABIC {LRE}{PDF} 123-456

The correct answer should be this:

  456-123  CIBARA NA

And for this one:

  AN ARABIC {LRE} {PDF} 123-456

The correct answer is this:

  123-456 CIBARA NA

Don't forget to check your algorithm with this two tests.


> Note that a slightly more complex example, like this:
> 
>   a {LRE}simple {RLO}teST{PDF} which{PDF} see
> 
> will be displayed with the formatting codes like this:
> 
>   a {LRE}simple {PDF}TSet{RLO} which{PDF} see
> 
> Does that look good enough?

This style looks good enough iff you display each level with different 
background color, to show the layers, otherwise this is not good, 
viewer will treat the first PDF, for LRE, and second PDF for RLO, as 
the layer structure is a stack, and if you do not display with 
different background colors, then this one is the best you can do:

  a {LRE}simple {RLO}TSet{PDF} which{PDF} see


Also there is a collection of very good test-datas for explicit marks, 
in fribidi CVS version, have a look at them.

  http://sourceforge.net/projects/fribidi/

Yours,
-- 
Behdad
24 Mehr 1380, 2001 Oct 16

[Finger for Geek Code]




reply via email to

[Prev in Thread] Current Thread [Next in Thread]