Re: Buffer names with R2L characters

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Buffer names with R2L characters

From:	Eli Zaretskii
Subject:	Re: Buffer names with R2L characters
Date:	Thu, 23 Jun 2011 12:16:01 +0300

> From: Stefan Monnier <address@hidden>
> Cc: address@hidden,  address@hidden,  address@hidden,  address@hidden,  
> address@hidden
> Date: Wed, 22 Jun 2011 18:27:14 -0400
> 
> Maybe another way to attack the problem is to say that the < and the >
> in that string are not neutral but "weak L2R" or something like that.

There's no "weak L2R" bidi type or category in UAX#9.  Weak types
include numbers (i.e. digits) and "number separators" (plus and
minus).  Changing the type of '<' and '>' to number separator will not
gain us anything, because these separators are treated the same as
neutrals, except when they are between two numbers.  Changing the type
to numbers could probably solve the problem, but runs the risk of
getting us in more trouble, since the treatment of numbers makes sense
only for numbers.

> Maybe this would also work for XML markup.

It won't.  In fact, it could make things worse.  To see it, take the
first example in this article:

   http://www.sw.it.aoyama.ac.jp/2005/pub/IUC28-bidi/IUC28.html

the one that uses Arabic, copy/paste it into *scratch* in Emacs 24
with bidi-display-reordering turned on, and replace every '<' and '>'
there with either '-' (a number separator) or a digit.  The result is
still unreadable gibberish, and in the case of digits it's even less
readable.

> We could specify such a thing via some char-table overriding the
> default bidi properties of specified chars.  We would either need to be
> able to set this as a text-property over the "<N>", or to have one for
> the mode-line.

First, there's no need to invent another char-table.  The bidi types
used by bidi.c are already specified in a char-table, so all you'd
need to do is to modify it (probably its copy).  Assuming we indeed
want to modify the properties of '<' and '>', that is -- which I think
is not a good idea.  (Btw, these two characters are not the only ones
that cause trouble in display of buffer names.  '~' is another one,
and in fact all the punctuation characters behave in the same way.
Are we going to modify the properties of all of them?)

And second, using text properties for overriding bidi properties is
not a good idea at all, because bidi.c works below the level that pays
attention to text properties.  Making it aware of text properties will
slow it down considerably, or require a complete redesign of how the
bidi display works in general, i.e. give up the total separation
between the reordering and the rest of the display engine.  I don't
think we want that on behalf of this relatively minor issue.

Bottom line, using the directional control characters is the best way
of adapting the visual appearance to user expectations when displaying
plain text.

XML and other non-pain text buffers are a different kind of problem.
There, we would like to display correctly not just text around '>',
but also comments and strings.  The problems there are with all the
punctuation characters near the end of the comments and strings (they
display at the wrong end of the last sentence) and with L2R text
embedded in the otherwise R2L text.  IOW, we would like to have a way
to display such comments and strings as if they were in an R2L
paragraph.  I don't yet know what would be a good solution to that.
In fact, I don't think we have an exhaustive list of situations where
the default reordering causes trouble and must be augmented by
something else.

> >> > I think Eli is wrong here. An example will help, a file with the
> >> > (logical) name "/abc/def GHIK/LMNO qrst" when uniquified will appear
> >> > as: "def ONML|KIHG qrst" which is clearly wrong.
> >> > My way to solve it is as above, i.e. add zero width LRM on both sides
> >> > of the separator (/ or |) in addition to the enclosing LRMs.
> >> I think this is beginning to become gross.
> > But it is a general solution that is easily implemented.
> 
> Indeed, for the buffer names it seems perfectly acceptable since we
> generate them ourselves and they don't go very far.  I'm not sure why
> Eli doesn't like this solution.

I don't like the proliferation of directional marks that this will
bring.  I hoped that we will need these directional control characters
only very rarely.  These have problems on TTYs, and even in GUI
sessions they are visible by default (as thin spaces), so they will
disrupt the visual appearance and cursor motion.  We will need to have
them everywhere, e.g. in the prompt displayed by read-buffer and in
other places, if we want buffer names to look the same in all
contexts.  But since this is the best available solution, I'm willing
to try; maybe I'm wrong and the results will not be that bad after
all.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Buffer names with R2L characters, (continued)
- Re: Buffer names with R2L characters, Kalle Olavi Niemitalo, 2011/06/20
  - Re: Buffer names with R2L characters, Eli Zaretskii, 2011/06/20
- Re: Buffer names with R2L characters, Ehud Karni, 2011/06/21
  - Re: Buffer names with R2L characters, Eli Zaretskii, 2011/06/21
    - Re: Buffer names with R2L characters, Ehud Karni, 2011/06/21
    - Re: Buffer names with R2L characters, Eli Zaretskii, 2011/06/21
    - Re: Buffer names with R2L characters, Stefan Monnier, 2011/06/22
    - Re: Buffer names with R2L characters, Eli Zaretskii <=
    - Re: Buffer names with R2L characters, Stefan Monnier, 2011/06/25

Prev by Date: Re: Committing new smtpmail.el later tonight
Next by Date: Re: Buffers relative order
Previous by thread: Re: Buffer names with R2L characters
Next by thread: Re: Buffer names with R2L characters
Index(es):
- Date
- Thread