emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: highlighting non-ASCII characters


From: Ted Zlatanov
Subject: Re: highlighting non-ASCII characters
Date: Fri, 26 Mar 2010 12:35:36 -0500
User-agent: Gnus/5.110011 (No Gnus v0.11) Emacs/24.0.50 (gnu/linux)

On Wed, 24 Mar 2010 20:34:41 +0100 Lennart Borgman <address@hidden> wrote: 

LB> The attached file sets up IDN chars as above. How about defining a
LB> character class [:idnchars:]?

The IDN character class could be useful.  The list changes so rarely
that it can be hard-coded like the POSIX classes IMO.  I think this
would be done in src/regex.c by defining RECC_IDNCHARS for instance.
This could highlight when non-IDN characters are used in a domain name.

But IDN characters are separate from the "confusables" (homoglyphs) we
should discuss, which are much more problematic and more complex because
they not just a character class.

On Thu, 25 Mar 2010 09:11:35 +0200 Juri Linkov <address@hidden> wrote: 

JL> I think it would be more useful to implement this spec:
JL> http://www.unicode.org/reports/tr39/data/confusables.txt
JL> "Visually Confusable Characters: Provides a mapping for visual
JL> confusables for use in further restricting identifiers for security".

JL> It's very large, but it seems it's still incomplete.  I can't find
JL> a "confusable" mapping for the problem I reported:

JL>   BOX DRAWINGS DOUBLE HORIZONTAL   ->   EQUALS SIGN

We can have a [:confusable:] character class defined in src/regex.c.
That lets us find these characters.  It could be generated from the TXT
database and augmented with our own mappings.  But there's grouping
information, so maybe that should be available too.  For highlighting we
don't need grouping information, but the user would find it useful to
look at a glyph and find out that it looks like 3 other glyphs.  So this
can be in a Lisp-level data structure like a hashtable with list values.

I looked at whitespace.el and it looks generally suitable for this kind
of highlighting.  I can't decide if the work should augment
whitespace.el or if it should be a new library called visible.el
(because the name whitespace.el is so specific).

On Thu, 25 Mar 2010 15:07:04 +0100 Lennart Borgman <address@hidden> wrote: 

LB> To me it looks like IDN is the most important. Is not this a
LB> derivative work from "confusables"?

I think they are separate logically.  TR39 cares about "confusables" in
the context of IDN but Emacs has a wider view as a general text editor,
IIUC.

Ted





reply via email to

[Prev in Thread] Current Thread [Next in Thread]