emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bidirectional text and URLs


From: Eli Zaretskii
Subject: Re: Bidirectional text and URLs
Date: Sun, 30 Nov 2014 23:05:36 +0200

> From: Lars Magne Ingebrigtsen <address@hidden>
> Date: Sun, 30 Nov 2014 19:13:54 +0100
> Cc: address@hidden
> 
> Because I was wondering whether my suggestion from yesterday (that we
> insert LRO/PDF characters into URLs if there is an LRO present in the
> buffer when recognising URLs) is at all feasible, and from your
> explanation, it seems like it would be.

IMO, you are jumping to solutions too early, without a good
understanding of the real problem.

I also guess that you meant RLO, not LRO.  The latter makes the
embedded text render like strict left-to-right characters, so it
doesn't need any special handling and cannot do any harm in URLs that
use left-to-right characters (which is 99.99% of URLs).

Can we please take a step back and try to identify the real problem
here?  What exactly are we trying to detect and handle?  Is it true
that we are trying to detect URLs whose characters got their "normal"
bidirectional properties overridden by some directional control
characters?  If so, I can write a primitive that will take a region of
buffer text and examine it to detect this.

If it is something else, please tell what that is, and chances are you
can have it without having to go through a crash course in UBA.

In any way, it is IMO wrong to look for specific controls that you
just happened to learn yesterday.  They are not what you need to look
for, they are just one sign of what you are looking for.  The UBA is
too complex an algorithm, and it keeps evolving, so chances are there
will be more ways to do these tricks.  You need to define what is it
that you are looking for, not search for this or that sign.

Next, given that you have detected the spoofed URL, what do you want
to do with it?  Do you want to highlight it, do you want to de-spoof
(i.e. undo the spoofing) in some way, but still leave some indication
of the fact that it was spoofed, or maybe you want to remove any trace
of the spoofing as if it never happened (and leave the user oblivious
to the fact it did)?

Given the answers to those questions, there's any number of possible
solutions that do NOT require inserting more directional controls.
Some of the possible solutions were already mentioned in this thread.
Here's another: cover the offending RLO with a display property
showing whatever you want -- a warning sign, a smiley, a string made
of a SPC character, anything.  You can try it with your example: you
will see the spoofing gone immediately.  Why is this worse than
inserting directional controls whose effect on the surrounding text
can be far reaching?

> 2) If there is an LRO in the buffer, then, after recognising an URL, it
> is further treated.
> 
> * If it contains no strongly right-to-left characters, we just wrap it
>   in an LRO/PDF pair.  URLs like "http://myspace.com"; will then be
>   guaranteed to be displayed reading left-to-right.
> 
> * If the URL is like http://אבג.דהוזחט.קום, we would segment the URL
>   into strongly-left-to-right-with-weak-chars and
>   strongly-right-to-left-with-weak-chars segments.  We wrap each
>   left-to-right-with-weak-chars in LRO/PDF pairs.

This will change how these URLs are displayed, in a way that users
will not like, and personally it sounds to me like another kind of
phishing.

> Emacs already exposes the weak/strong/LTR/RTL status of each character,
> so function to do this LRO/PDF insertion is trivial.  It's like a
> seven-line Elisp function or something.

It's easy to insert them, yes.  But the effect is not what you or our
users necessarily want.  More importantly, there are better ways to
deal with that, provided that we DEFINE WHAT PROBLEMS DO WE WANT TO
SOLVE, AND HOW.

> >From what you say, sounds like it would make the display of these URLs
> acceptable for bidi readers, too -- this would be the normal display of
> these URLs, anyway.

No, it isn't.  You cannot get the correct display by overriding the
bidi properties with LRO or its ilk.  You can see the differences by
moving point with C-f.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]