emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: idn.el and confusables.txt


From: Ted Zlatanov
Subject: Re: idn.el and confusables.txt
Date: Sat, 14 May 2011 08:40:48 -0500
User-agent: Gnus/5.110018 (No Gnus v0.18) Emacs/24.0.50 (gnu/linux)

On Sat, 14 May 2011 11:06:52 +0300 Eli Zaretskii <address@hidden> wrote: 

>> It does sound like confusables.txt could be turned into
>> a lisp/international/uni-confusables.el, but I don't know whether there
>> is a large benefit from having it part of Emacs as opposed to having it
>> in GNU ELPA.  As for idn.el, I haven't seen the file, and don't know
>> what uses it, so I can't judge.

EZ> What is idn.el? where can I see it?  And how and where would we like
EZ> to use it?  I searched the relevant threads (which were all spin-offs
EZ> of other threads, which didn't help searching for the info), but
EZ> didn't find any pointers.  Apologies if I missed something.

Both idn.el and confusables.txt can be used by markchars.el to show
suspicious characters, especially in URLs but also in other situations
(e.g. accidentally putting a Cyrillic ะพ instead of the Latin o in e-mail
text).

The latest markchars.el is in the GNU ELPA, though the primary location
may continue to be nXhtml.

EZ> You see, the uni-*.el files we create out of the Unicode DB are not
EZ> used anywhere in application code, AFAIK.  We use them to display
EZ> character properties in the likes of "C-u C-x =", and that's it.  I'm
EZ> not even sure they are organized in a way that makes them useful.

markchars.el could use other Unicode properties if people ask.  But
specifically regarding the ones I'm proposing for inclusion, since we've
started using the GNU ELPA more and markchars.el lives in it, we can put
uni-confusables.el and uni-idn.el in the GNU ELPA instead of the Emacs
trunk.

EZ> So I'd really like to avoid introducing yet another huge table whose
EZ> only effects are to show one more property in "C-u C-x =" and bloat
EZ> the ELisp manual some more.

IMO it's not a huge table and should not bloat the manual significantly
if it was in the trunk.  There is useful extra information for each
character (the characters it can be confused with) which would grow the
char-table if it was included.  Also the char-table doesn't have to
cover the Asian confusables--I'm not sure anyone would need those.  So
there's some vagueness as far as the memory usage.

EZ> Can we please have some preliminary ideas and design for using the
EZ> "confusables" information and the IDNA protocol in Emacs, before we
EZ> decide whether and how to include them?

We've had literally hundreds of messages on this over the last year,
unfortunately over many threads as you pointed out.  My best attempt to
propose their usage would be as supplements to markchars.el, I can't
think of other uses currently.

EZ> So we are discussing addition of a feature that is only used by an
EZ> unbundled package, and then only to highlight certain characters, is
EZ> that right?

uni-confusables.el and uni-idn.el will define Unicode properties and not
a feature per se.  Thus they are really reference tables and not
functional changes to Emacs.  I point this out because the burden of
including them in the Emacs trunk is not large: some memory usage,
keeping the Unicode data updated, and the conversion scripts.

On Sat, 14 May 2011 11:13:46 +0300 Eli Zaretskii <address@hidden> wrote: 

EZ> AFAIK, admin/unidata/unidata-gen.el can only parse the format of the
EZ> UnicodeData.txt file.  confusables.txt is in different format, so I
EZ> don't think you can reuse unidata-gen.el for that.

OK, I'll set a converter up to live in the trunk or in the GNU ELPA when
the maintainers decide where uni-confusables.el and uni-idn.el should be.

Ted




reply via email to

[Prev in Thread] Current Thread [Next in Thread]