emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Ispell and unibyte characters


From: Eli Zaretskii
Subject: Ispell and unibyte characters
Date: Sat, 17 Mar 2012 20:46:54 +0200

The doc string of ispell-dictionary-alist says, inter alia:

  Each element of this list is also a list:

  (DICTIONARY-NAME CASECHARS NOT-CASECHARS OTHERCHARS MANY-OTHERCHARS-P
          ISPELL-ARGS EXTENDED-CHARACTER-MODE CHARACTER-SET)
  ...
  CASECHARS, NOT-CASECHARS, and OTHERCHARS must be unibyte strings
  containing bytes of CHARACTER-SET.  In addition, if they contain
  a non-ASCII byte, the regular expression must be a single
  `character set' construct that doesn't specify a character range
  for non-ASCII bytes.

Why the restriction to unibyte character sets?  This is quite a
serious limitation, given that the modern spellers (aspell and
hunspell) use UTF-8 as their default encoding.

The only reason for this limitation I could find is in
ispell-process-line, which assumes that the byte offsets returned by
the speller can be used to compute character position of the
misspelled word in the buffer.  Are there any other places in
ispell.el that assume unibyte characters?

If ispell-process-line is the only place, then it should be easy to
extend it so it handles correctly UTF-8 in addition to unibyte
character sets.

In any case, I see no reason to specify CASECHARS, NOT-CASECHARS, and
OTHERCHARS as ugly unibyte escapes, since their usage is entirely
consistent with multibyte characters: they are used to construct
regular expressions and match buffer text against those regexps.  Did
I miss something important?

Any comments and pointers to my blunders are welcome.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]