emacs-bidi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [emacs-bidi] Embedding levels of formatting codes


From: Eli Zaretskii
Subject: Re: [emacs-bidi] Embedding levels of formatting codes
Date: Tue, 16 Oct 2001 18:52:50 +0200

> From: Behdad Esfahbod <address@hidden>
> Date: Tue, 16 Oct 2001 16:02:23 +0330 (IRT)
> 
> fribidi does not physically remove them too, but I first implemented 
> the algorithm described in Implementaion Notes of UAX#9, and when I 
> found a test that the output was wrong for it, Roozbeh reported this 
> to them, and they fixed the Implemention Notes in 3.0.1, and after 
> that, I implemented a nice algorithm in fribidi, that I think everyone 
> can use too, it's I remove them logically, I mean from the runs chain, 
> and then insert them after resolving levels, then assign a level to 
> them, this part is what that you are interested in.

Yes, I've looked into FriBidi's code some time ago (only after the
design and most of the imlpementation of my code was done ;-), and saw
this removal followed by insertion.

Unfortunately, this cannot be done in the implementation that I'm
working on, which processes characters one by one, and is called anew
for each character.  There's no ``before'' and ``after'' in this code;
everything is done in-place, and of course I cannot modify the buffer
by removing characters from it (I could copy and then work on the
copy, but that would probably significantly slow down the code).  I
cannot even rely that I'll be called again, since the higher levels
can decide they don't need any more characters (e.g., because the
glyph row is long enough to reach the window margin).  I must do
everything in one go and then repeat it for the next character when
I'm called again.  The only ``memory'' I'm allowed to keep is what is
stored in a special iterator structure used to traverse the buffer in
the visual order; that structure is passed by the caller when my code
is called.

That's why it is so much more convenient for me to keep the formatting
codes in the buffer, and let the caller decide what to do with them
(normally, the caller is expected to throw them away and immediately
call the bidi code again, to deliver the next character, since the
normal display mode should be to hide the formatting codes).

> It is a simple and nice way to handle them, fribidi currently does 
> something like this, but it can cause problems if you implement it 
> without enough care, consider this test:
> 
>   AN ARABIC {LRE}{PDF} 123-456
> 
> The correct answer should be this:
> 
>   456-123  CIBARA NA
> 
> And for this one:
> 
>   AN ARABIC {LRE} {PDF} 123-456
> 
> The correct answer is this:
> 
>   123-456 CIBARA NA

I don't see the difference between these two test cases.  Why should
the single blank between LRE and PDF make any difference here?  Did
you perhaps mean LRO instead of LRE?

(Btw, you don't say, but I assume you meant that all the upper-case
letters in this example are Arabic letters, not Hebrew letters,
right?)

> >   a {LRE}simple {PDF}TSet{RLO} which{PDF} see
> > 
> > Does that look good enough?
> 
> This style looks good enough iff you display each level with different 
> background color, to show the layers

The colors will probably be offered, but not by default, in
particularly because it might make a terrible mess when you have
deeply-nested embeddings, and some other font-lock-derived colors on
top of that.  Even with two embedding levels, it won't be easy to
grasp the meaning of each color, since the mapping between the color
and the embedding level is not obvious just by looking at the colors.

I also assume that people mostly won't want the colors or the
formatting codes, unless the display looks wrong and they cannot fix
it in a couple of simlpe keystrokes.  So if we get our input methods
right (to offset the few nonsensical effects of UTR#9), I hope the
display of these details will be rarely needed.

> otherwise this is not good, 
> viewer will treat the first PDF, for LRE, and second PDF for RLO, as 
> the layer structure is a stack

I think you rely on users' stack perception too much ;-)  Not everyone
has a mental stack machine in their mind when they look at
bidirectional text.  Also, don't forget that LREs and PDFs don't need
to be balanced, in which case even the display you suggested (below)
could fool anyone.

To help users without spilling too much colors, we could offer
commands that momentarily put the cursor on the other end of the
embedding, like with blink-matching-open.  We will probably have to do
that anyway, since there are displays which don't support colors.

> and if you do not display with 
> different background colors, then this one is the best you can do:
> 
>   a {LRE}simple {RLO}TSet{PDF} which{PDF} see

Alas, this is impossible to produce without lots of risky messing with
the resolved levels generated by UTR#9.  Given the considerations I
described above, it doesn't seem to me like it's worth the hassle,
certainly not at first approximation.  If there's a public outcry
about this when bidi-capable Emacs is released, we can always do
something later.

> Also there is a collection of very good test-datas for explicit marks, 
> in fribidi CVS version, have a look at them.

Thanks, I already know about that, and used all of your tests to test
my code.  It passed with flying colors ;-)

Last, but not least, thanks to everyone who gave feedback in this
thread.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]