|
From: | Mike C. Fletcher |
Subject: | Re: [aspell-devel] Mis-spellings database? |
Date: | Wed, 06 Nov 2002 16:33:41 -0500 |
User-agent: | Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.1) Gecko/20020826 |
A considerable number of the problems seem to be with what may be a faulty phonetic compressor (psy doesn't seem to go to sy, sy doesn't go to si, stand-alone y, i and e are not considered equal, etceteras), a few of them are with my optimised search for distance > 1 (which takes the tack that you can mis-spell/pronounce either of the start or the end of a word, but not both, and still get a decent response).
Anyway, for the task at hand (building a common-error-tracker for suggestions and rankings), the 1100 or so I have with yours + the mentioned corpus should give enough for everything but stress-testing and distributable "starter" dictionaries.
Will let you know if I find a large database some day (though, as a note, if developers actually use this feature, it should be possible to simply ask users to export their common-errors tables and send them to us to get very large corpi (corpuses?) :) ).
Thanks, Mike Kevin Atkinson wrote:
On Wed, 6 Nov 2002, Mike C. Fletcher wrote:I'm wondering if anyone knows of a decent mis-spellings database anywhere? That is, a mapping from mis-spelling to correct spelling (or vice-versa)? I'm currently using a 550-item set adapted from:http://www.actwin.com/rwmack/spelling.htmand it's fine for testing, but I'm looking for something that might have a few tens of thousands of entries. Basically, I want to build a "common error tracking" system into my spell-checker, and would like a corpus of (real-world (English)) data so that I can judge the effectiveness of the new feature when built.The best I can offer you is a small set I used for testing Aspell. It contains a lot of my own misspellings plus a few others from other sources. It has a lot of the more difficult misspellings that many spell checkers are unable to get. You can find it at http://aspell.net/test/.If you do find such a list I will certainly be interested in it also.--- http://kevin.atkinson.dhs.org
-- _______________________________________ Mike C. Fletcher Designer, VR Plumber, Coder http://members.rogers.com/mcfletch/
[Prev in Thread] | Current Thread | [Next in Thread] |