emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bidirectional text and URLs


From: Eli Zaretskii
Subject: Re: Bidirectional text and URLs
Date: Mon, 01 Dec 2014 20:22:36 +0200

> From: Lars Magne Ingebrigtsen <address@hidden>
> Cc: address@hidden
> Date: Mon, 01 Dec 2014 18:49:58 +0100
> 
> Eli Zaretskii <address@hidden> writes:
> 
> >> > Anyway, if you want this, please show the API of the function -- what
> >> > it should return and how.
> >> 
> >> Actually, I'm not sure.  :-) Would it make any sense to have a function
> >> like `(displayed-directionality POSITION)' that returns either
> >> `right-to-left' or `left-to-right?  If so, the URL-finding function
> >> would query about the start of the URL (which would normally be the HTTP
> >> part), and if that's `right-to-left', Here There Be Shenanigans.
> >
> > How is this different from the previous suggestion?
> 
> I'm not sure what you are referring to.

I'm saying that asking about "characters between FROM and TO that were
supposed to be LTR, but was forced to display as RTL", and asking
essentially the same question about a character at POS, is actually
asking the same question.  IOW, the same API will be able to satisfy
both needs.

  (defun bidi-find-overridden-directionality (from to)
     "Return position between FROM and TO where directionality was overridden.

   This function returns the first character position in the specified
   region where there is a character whose `bidi-class' property is `L',
   but which was forced to display as `R' by a directional override,
   and likewise with characters whose `bidi-class' is `R' or `AL'
   that were forced to display as `L'.

   Strong directional characters `L', `R', and `AL' can have their
   intrinsic directionality overridden by directional override
   control characters RLO \(u+202e) and LRO \(u+202d)."

OK?

If you want, the function can return a cons cell (POS . DIR), where
POS is the position and DIR is the intrinsic directionality of the
overridden character.  Or even (POS . DIR-ORIG DIR-OVERRIDDEN).

> > No, only RLOs that affect URLs.
> >
> > Specifically, I suggest to look for RLO before a URL on the same
> > physical line, and PDF or hard newline after it, and if found, cover
> > it by a display property whose value is e.g. a string " ".  Since just
> > the fact that you find an RLO before doesn't yet mean that it's a
> > malicious RLO (other bidirectional controls which you don't want to
> > know about can countermand the RLO before it affects the URL display),
> > I suggest to augment that by checking that the URL's host and domain
> > parts consist of LTR characters whose directionality was overridden.
> > The latter part is to be done by calling a new primitive mentioned
> > above.
> >
> > Given all this evidence, I think it's pretty much certain that we
> > found our offending RLO.
> 
> If you think that that's sufficient (that we only need to look for
> preceding RLOs on the same line), then this sounds like a good solution
> to me.

We need to look for an RLO on the same line when a LTR character was
forced to display as RTL, and for LRO in the opposite case.

This will detect the case you've demonstrated at the beginning of this
thread.  I don't know about other similar cases, so if you don't know
either, I suggest to treat this problem, and take it from there.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]