emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bug 130397


From: Geoff Kuenning
Subject: Re: Bug 130397
Date: 29 Apr 2005 02:29:41 +0200
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3

For those of you who don't know, I've released ispell 3.3.00.  Having
gotten that off my plate, I'm busily working on some improvements that
will go into 3.3.01.  Number one on that list is to redo the
fixispell-a script that I whipped up a few months ago.

Juri points out:

> This approach is quite promising, but it doesn't work sufficiently well
> for non-English languages.  It loses all characters that don't belong
> to the alphabet specified in .aff file.

and:

> But there is another problem.  fixispell-a returns a list of near misses
> only for the last language in the pipe.  It would be better if it
> accumulated a list of near misses from all ispell commands in the pipe.

The former problem is best addressed using Juri's suggestion of
passing the "-w" switch to specify a superset.  In addition, in the
new release, the english.aff file includes all of Latin-1 (since
English sometimes adopts accented words and names from other
languages).  The -w switch is still needed, though, to handle things
like the apostrophe, which isn't in all non-English affix files.  I
welcome further suggestions.

The latter problem motivated me to write an entirely new program,
multispell, which does a better job of what fixispell-a attempted.
It's invoked as:

        multispell [ispell-switches] dict1 dict2 dict3

For example:

        multispell -m english deutsch francais

Multispell behaves like ispell -a, but accepts any word that any of
the mentioned dictionaries accept.  If a word is rejected, it combines
suggestions from all dictionaries.  So, for example, sending "wuld" to
the above line produces:

        & wuld 0 7 weld, wild, wold, would, Wald, wild, wund

This brings me to a question and a discussion point.  The question is
highlighted in the above line: the word "wild" appears as a
suggestion twice, because the English and German dictionaries both
produce it.  Do people think that's a Bad Thing?  I can certainly
write code to suppress the duplicates; I'm just feeling lazy at the
moment. *grin*

The discussion point is a bit more complex.  If you invoke multispell
with:

        multispell -T latin1 -m english deutsch francais

it will fail because the English dictionary doesn't recognize "latin1"
as a valid encoding.  How do people think I should handle these
variations among affix files?  One obvious option would be to make the
-T switch be dictionary-specific in multispell, so you'd write:

        multispell -m -T list english -T latin1 deutsch -T latin1 francais

Another option would be to insist that all affix files follow a common
naming scheme, so that everybody would be willing to accept "latin1"
as an encoding name, and so forth.

>From my point of view, both options are bad.  The first requires too
much intelligence on the part of ispell.el.  The second is going to be
hard to enforce.

Opinions are welcomed.
-- 
    Geoff Kuenning   address@hidden   http://www.cs.hmc.edu/~geoff/

Windows XP is the "most reliable Windows ever," which is like saying
that asparagus is "the most articulate vegetable ever."
        -- Dave Barry




reply via email to

[Prev in Thread] Current Thread [Next in Thread]