[emacs-bidi] Re: RTL support

emacs-bidi
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[emacs-bidi] Re: RTL support

From:	Gregg Reynolds
Subject:	[emacs-bidi] Re: RTL support
Date:	Fri, 25 Nov 2005 11:24:19 -0600
User-agent:	Mozilla Thunderbird 1.0 (Macintosh/20041206)
Benjamin Riefenstahl wrote:
> Hi Gregg,
> 

Hi Benny,

> 
> I am not an Emacs developer, and I don't plan to work on this issue
> right now.  I also don't believe that you have brought up new
> arguments to change the decisions about how this is to done in Emacs
> in the future.  I do think that an occasional check of these ideas is
> a good thing, though.

Fair enough.  (However, my original post wasn't intended to be
argumentative, but to ask about some specific design options.  So to be
clear, I don't mean to advocate any particular option at this point,
since I don't know enough about Emacs internals.  The general
observation that graphical layout, text reordering (via bidi or any
other algo), and shaping are mutually orthogonal applies generally to
any notion of text processing.

> 
> So this exchange is mostly about: "Would I personally find useful
> software that worked along the lines hat you suggest."
> 
Yep; this is my itch.  I happen to think scratching it would benefit
many others, but of course that is (informed) speculation.

> Gregg Reynolds writes:
> 
>>Two things.  One is, directionality a design choice, not a
>>reflection of some kind of objective reality.
> 
> 
> The fact that Arabic and other scripts are written and read from right
> to left is a design feature of the script that we can't just ignore
> when we implement it in computers, we have to deal with it at some
> level.  The question is at *which* level.

Sorry, I wasn't clear.  I mean that modeling a single script/language as
mono- or bi-directional is a design choice, not a statement of a Law of
Nature.  This might be better expressed by saying the choice of number
polarity - MSD or LSD first in strings - is a design choice.  One could
model English text with LSD-first digit strings if one wanted.

This means, among other things, that "RTL" does not imply "bidi", any
more than "LTR" does.  RTL/LTR refers solely to graphical syntax, not to
an encoding model.

> 
>>There is no necessary relationship between the IO model implemented
>>by an application and the corresponding textual representation,
> 
> 
> Exactly.  Which is why Unicode put the complicated parts into the IO
> model (for human IO) with BIDI reordering, while any software module
> that doesn't have human IO can completely ignore the issue.  The same

I don't understand what you say here.   Unicode as I understand it
doesn't have anything at all to say about IO; it just defines character
semantics and syntax (accent after base char, etc.)  Note that there are
no complicated parts for monodirectional text.  It's the bidi
requirement itself that creates the complication.

Another clarification:  I'm not arguing against bidi support where it is
truly needed, namely in mixed language texts.  Nor am I arguing that
Emacs should not have bidi support - it should, obviously.

I guess the point is that we can get there in stages.  First you
implement RTL layout, then shaping, then bidi.  That way we have
*usable* software without having to wait for bidi support, and
eventually we do have full bidi support.  Vim provides the model: you
can switch on/off RTL layout and Arabic shaping independently; hopefully
someday somebody will add bidi support too.  But in the meantime it is
very useful for working with Arabic text.  I'd simply like for Emacs to
be as useful, since I'm firmly in the Emacs camp when it comes to editors.

> goes for most software that directly implements human IO but uses
> pre-fabricated building blocks for it (using e.g. GTK or Qt).
> 
> If OTOH you use visual ordering in the encoding you make life easier
> for a few primitive versions of the IO and complicated for all the
> rest of the software.  Not to mention that it makes it even more
> complicated for more advanced - read: user-friendly - versions of IO.

I don't see how.  Can you provide an example of how this would make
things more complicated?  I mean other than with math routines.  That I
admit is the big problem.  There are ways around it, but that's for
another thread.
> 
> There is a third possibility in our case, using visual order within
> Emacs and only storing the text in logical order.  That is possible in
> a simple text editor (and I am sure there are some of those around).
> But Emacs does a lot more, of course.  Every module in Emacs that
> needs to look at the logical order would have to make the reordering
> anyway.  And as Emacs is about text processing that would probably be

I don't see why.  Example?

> a lot of modules.
> 
> That's

I think we're talking about two separate things.  In my opinion, the
internal encoding used by Emacs is irrelevant, so long as I know what it
is.  I just want RTL layout and Arabic shaping, both of which simply
operate on a string of chars/glyphs.

Actually, even when Emacs has full bidi support, I would still want a
"transparent" mode that will provide a graphical representation of the
true (physical) ordering of the text.

The question of how best to represent text internally is an interesting
one, but I haven't given it much thought.  I do think Emacs did the
right thing by *not* adopting Unicode as its internal representation.

 the choice.  I personally prefer the first way of doing it.
> 
> 
>>More important is that RTL has no necessary relationship to mixed
>>content or bidi reordering.  If you only ever write documents in
>>Arabic (Hebrew, Persian, Pashto, whatever) then why do you need
>>bidi?
> 
> 
> A large part (maybe still a majority) of the people that write Arabic
> and Hebrew on computers write in more than just one language.  This is
> even if you discount numbers and trademarks.

Yes, I've heard this claimed many times, but I've never seen any
evidence to back it up.  My personal experience is that it is simply not
true.  In the Arab world, at least, *most* people do *not* operate in
multiple languages (just like in the US), and from what I've personally
seen they get along fine using Arabic only on a computer, just as most
Americans get along fine using English only.  Even scholarly articles
written in English about Arabic generally use transliteration.  Things
are no different in the Arab world.  When newspapers need to write "CNN"
or "FBI", they transliterate it.  Then need for full mixed directional
support is quite specialized, probably everywhere in the world.

Add to that the fact that multilanguage computing w/out bidi support is
quite feasible.  I do it all the time using Vim and even Emacs.


> 
> 
>>To be clear: monolingual Arabic text is not mixed content, whether
>>it contains digit strings or not.  So why should an Arabic user pay
>>the Unicode tax of bidi support?
> 
> 
> A large part of the user base right now does need mixed content.  So

That may be true for the *current* emacs user base.  Then again, Emacs
has no RTL user base, since Emacs doesn't support RTL.  Whether or not
the potential RTL user base truly needs multilanguage (mixed
directionality) support is a matter of speculation.  But we *know* that
they need RTL layout and shaping, and we also know that RTL layout and
shaping is sufficient to make software useful.

Besides, to me the user base is everybody in the world.  Whoever wants
to use it, should be able to use it.  Lack of bidi support need not
prevent the software from being useful for people who don't need bidi
support.

> you would get the tax of supporting several versions of software, the
> software for people that don't need mixed content and another version
> for people that do.  Even if the first version on its own might be
> cheaper, on the whole this will get more costly.  Not to mention that
> it would end up in a system where the "natives" get the "stupid"
> mono-lingual software and the "experts" and the westerners can afford
> the "intelligent" software for the mixed content.

I guess I wasn't clear - see my note above.  As you note, it wouldn't
make much sense to support two RTL versions of a piece of software, one
with and one without bidi support.  But there would be no reason to do
so; RTL w/out bidi would just be a stage on the way to full bidi
implementation.

It's interesting that you perceive the "intelligent" software as the
stuff with bidi support.  In my experience it is just the opposite:
editors with Unicode bidi support are really stupid, from the end user
point of view.  They are often almost impossible to use, thanks to
bizarro cursor behaviour and the directional ambiguity Unicode
explicitly assigns to characters like puncuation, parens, etc.  I find
Vim much simpler and more user-friendly.

Somewhere in the GCC list archives there's a note from RMS in response
to an issue involving support for some obscure feature of the ISO C++
standard (if I recall correctly), in which he says it all in a very few
words, something along the lines of "Standards are recommendations; we
should design to meet the needs of our community; if the Standard helps
with that, then we support it, but if not we shouldn't hesitate to
ignore it and do what is best for the community."  Software support for
Unicode RTL scripts provides a classic example of getting things
backwards - designing to satisfy the standard instead of community needs.

In summary, there's more than one way to skin a cat, as the (American)
saying goes.  Emacs (and other software) can be quite useful to RTL
users without bidi support.  It's better to have bidi support,
naturally, but the cost if bidi implementation need not stand in the way
of providing useful stuff, and providing useful stuff by supporting
non-bidi RTL and shaping need not inhibit implementation of bidi support.

thanks,

gregg
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [emacs-bidi] Re: RTL support, (continued)
Prev by Date: [emacs-bidi] Re: RTL support
Next by Date: [emacs-bidi] Re: RTL support
Previous by thread: [emacs-bidi] Re: RTL support
Next by thread: [emacs-bidi] Re: RTL support
Index(es):
- Date
- Thread