aspell-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[aspell-devel] Mis-spellings database?


From: Mike C. Fletcher
Subject: [aspell-devel] Mis-spellings database?
Date: Wed, 06 Nov 2002 04:23:53 -0500
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.1) Gecko/20020826

I'm wondering if anyone knows of a decent mis-spellings database anywhere? That is, a mapping from mis-spelling to correct spelling (or vice-versa)? I'm currently using a 550-item set adapted from:
   http://www.actwin.com/rwmack/spelling.htm
and it's fine for testing, but I'm looking for something that might have a few tens of thousands of entries. Basically, I want to build a "common error tracking" system into my spell-checker, and would like a corpus of (real-world (English)) data so that I can judge the effectiveness of the new feature when built.

BTW, for those following along at home, I've got the phonetic searching working quite nicely (.1 to .6s/search on average using Metakit), and it finds the correct words in the above test-set most of the time. There are a number of examples that resist phonetic comparison quite strongly, which accounts for a few of the errors. (Basically high-level rules mistakenly compress features so that similarly-spelled sub-strings wind up with entirely different phonetic encodings.) My latest optimisation of the distance > 1 searches (reversed-word-indexing) also means that a few entries where errors occur both in the beginning and the end of the word aren't caught. All-in-all, about 20 or so of the errors don't get their intended word as a suggestion.

Anyway, have fun all,
Mike

_______________________________________
 Mike C. Fletcher
 Designer, VR Plumber, Coder
 http://members.rogers.com/mcfletch/







reply via email to

[Prev in Thread] Current Thread [Next in Thread]