[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[aspell-devel] Mis-spellings database?
From: |
Mike C. Fletcher |
Subject: |
[aspell-devel] Mis-spellings database? |
Date: |
Wed, 06 Nov 2002 04:23:53 -0500 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.1) Gecko/20020826 |
I'm wondering if anyone knows of a decent mis-spellings database
anywhere? That is, a mapping from mis-spelling to correct spelling (or
vice-versa)? I'm currently using a 550-item set adapted from:
http://www.actwin.com/rwmack/spelling.htm
and it's fine for testing, but I'm looking for something that might have
a few tens of thousands of entries. Basically, I want to build a
"common error tracking" system into my spell-checker, and would like a
corpus of (real-world (English)) data so that I can judge the
effectiveness of the new feature when built.
BTW, for those following along at home, I've got the phonetic searching
working quite nicely (.1 to .6s/search on average using Metakit), and it
finds the correct words in the above test-set most of the time. There
are a number of examples that resist phonetic comparison quite strongly,
which accounts for a few of the errors. (Basically high-level rules
mistakenly compress features so that similarly-spelled sub-strings wind
up with entirely different phonetic encodings.) My latest optimisation
of the distance > 1 searches (reversed-word-indexing) also means that a
few entries where errors occur both in the beginning and the end of the
word aren't caught. All-in-all, about 20 or so of the errors don't get
their intended word as a suggestion.
Anyway, have fun all,
Mike
_______________________________________
Mike C. Fletcher
Designer, VR Plumber, Coder
http://members.rogers.com/mcfletch/
- [aspell-devel] Mis-spellings database?,
Mike C. Fletcher <=