Ispell and unibyte characters

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Ispell and unibyte characters

From:	Eli Zaretskii
Subject:	Ispell and unibyte characters
Date:	Sat, 17 Mar 2012 20:46:54 +0200

The doc string of ispell-dictionary-alist says, inter alia:

  Each element of this list is also a list:

  (DICTIONARY-NAME CASECHARS NOT-CASECHARS OTHERCHARS MANY-OTHERCHARS-P
          ISPELL-ARGS EXTENDED-CHARACTER-MODE CHARACTER-SET)
  ...
  CASECHARS, NOT-CASECHARS, and OTHERCHARS must be unibyte strings
  containing bytes of CHARACTER-SET.  In addition, if they contain
  a non-ASCII byte, the regular expression must be a single
  `character set' construct that doesn't specify a character range
  for non-ASCII bytes.

Why the restriction to unibyte character sets?  This is quite a
serious limitation, given that the modern spellers (aspell and
hunspell) use UTF-8 as their default encoding.

The only reason for this limitation I could find is in
ispell-process-line, which assumes that the byte offsets returned by
the speller can be used to compute character position of the
misspelled word in the buffer.  Are there any other places in
ispell.el that assume unibyte characters?

If ispell-process-line is the only place, then it should be easy to
extend it so it handles correctly UTF-8 in addition to unibyte
character sets.

In any case, I see no reason to specify CASECHARS, NOT-CASECHARS, and
OTHERCHARS as ugly unibyte escapes, since their usage is entirely
consistent with multibyte characters: they are used to construct
regular expressions and match buffer text against those regexps.  Did
I miss something important?

Any comments and pointers to my blunders are welcome.

[Prev in Thread]

Current Thread

[Next in Thread]

Ispell and unibyte characters, Eli Zaretskii <=
- Re: Ispell and unibyte characters, Agustin Martin, 2012/03/26
  - Re: Ispell and unibyte characters, Eli Zaretskii, 2012/03/26
    - Re: Ispell and unibyte characters, Lennart Borgman, 2012/03/26
    - Re: Ispell and unibyte characters, Agustin Martin, 2012/03/28
    - Re: Ispell and unibyte characters, Eli Zaretskii, 2012/03/29
    - Re: Ispell and unibyte characters, Andreas Schwab, 2012/03/29
    - Re: Ispell and unibyte characters, Eli Zaretskii, 2012/03/30

Prev by Date: Re: bug#10612: GnuTLS bundled with the windows Emacs binaries
Next by Date: Re: Completion with (:exclusive 'no) is called twice, and doesn't pass over on sole completion.
Previous by thread: Re: bug#10612: GnuTLS bundled with the windows Emacs binaries
Next by thread: Re: Ispell and unibyte characters
Index(es):
- Date
- Thread