emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: highlighting non-ASCII characters


From: Ted Zlatanov
Subject: Re: highlighting non-ASCII characters
Date: Mon, 29 Mar 2010 16:05:29 -0500
User-agent: Gnus/5.110011 (No Gnus v0.11) Emacs/24.0.50 (gnu/linux)

On Mon, 29 Mar 2010 16:19:07 -0400 Stefan Monnier <address@hidden> wrote: 

>> Yes, probably.  But that's accidental.  I still think the character
>> classes [:idn:] (revised name from before) and [:confusable:] (or
>> [:homoglyph:]) would make sense as a first step, then we can decide how
>> to highlight them.

SM> The homoglyph data would be a useful starting point for the feature
SM> I imagine, indeed.  But from the message that started this thread, "K"
SM> is a homoglyph, yet highlighting it everywhere doesn't sound like a good
SM> idea, so basically we need to associate with each homoglyph char
SM> a context where it is expected and only highlight it when it appears in
SM> a different context (or maybe rather when it appears in the context of
SM> its peer).

(I had a "lightbulb moment" I should have had long ago: "confusable" is
a character property, while "homoglyph" is a glyph property; thus the
character class should be [:confusable:] and "homoglyph" should be used
in the face name as long as it's not, er, confusing.)

I know the goal is to match in context and I may take whitespace.el as a
guide in this regard, but I have to start with a [:confusable:]
character class.  I'll also add a [:idn:] class as discussed.  Is that
OK or are you concerned about code bloat in regexp.c?

Afterwards we can set up the map between each confusable character and
the set of characters it can match; this is also in the data file.  That
lets us look in context and apply the rules I proposed.  So for example
if Cyrillic K is confusable with Roman K and we see Roman characters
around, that's suspicious.  But Cyrillic "zhe" is not confusable with
any Roman characters so it wouldn't be as suspicious.

On Mon, 29 Mar 2010 11:48:28 -0700 "Drew Adams" <address@hidden> wrote: 

DA> But it occurred to me that besides different categories of such critters 
there
DA> might be different levels of fontification details that users might want to 
see.

DA> For example, for some users or for some purposes, it might be useful to see
DA> different kinds of quote marks distinguished (e.g. different kinds of curly
DA> quotes that might be homoglyphs or curly vs straight quotes, which are not
DA> homoglyphs). For other users or for other purposes such highlighting would 
be a
DA> distraction.

I'll set up a flexible mechanism, probably patterned after
whitespace.el, to do this kind of highlighting.  So the users will be
able to extend it if needed.

I don't know about curly vs. straight quotes.  I don't think that's a
significant problem, whereas a Cyrillic K in Roman text can actually
cause problems and security compromises.  I'm not against the idea, I
have just never seen it become an issue, and there's a million ways to
combine quotation marks depending on the context.  What's the specific
case that you're thinking of?

Ted





reply via email to

[Prev in Thread] Current Thread [Next in Thread]