aspell-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[aspell] Re: [dict-beta] Aspell (formully Kspell) News and Updates


From: Kevin Atkinson
Subject: [aspell] Re: [dict-beta] Aspell (formully Kspell) News and Updates
Date: Wed, 28 Oct 1998 10:15:53 -0500

Rik Faith wrote:
> 
> Just a short note on copyright issues.
> 
> The list of "not found" words on the FILE page are words which thousands of
> random people typed into dict search engines.  These lists are in the
> public domain because our hope is that people will use them for things like
> aspell and will contribute their work back to the community.  For example,
> a huge public domain list of misspelled words with 1-3 possible corrections
> per words would be a valuable contribution that we should host from the
> file page.  [I say 1-3, because some of the word pairs in the aspell list
> are not corrections which I would have selected.]

The problem is that the list will be constently changing as aspell
improves.
Lets talk about how to work this out best.  Perhapes a link to my page
instead.  Or maybe a way that I can have access to update the page
myself?

Fell free to keep a local copy but keep in mind that it may not always
be
up to date.

Also the correct.tab.gz should prove very variable to the FILE project. 
Here 
the larger word list used the better.  I also plan to run this list
against
the dict servers dictionaries to eliminate ones that were added. 
Checking the
correct list for affections of root words would also help.

> 
> The list of "found" words on the FILE page are a subset of words which
> appear in all of the hosted dictionaries, with the subset selected from the
> input of thousands of random people.  I think this list can arguably be
> placed in the public domain, but this is not as clear as the other case.
> Fortunately, from a spell-checking standpoint, this list is far less
> interesting than the "not found" list.  Certainly, it can be used to prime
> a spelling checker, but so can the web2 wordlist and the (now public
> domain) Moby Words word list.  Although papers from the 1970's suggest that
> a word list for a spell checker should have no more than about 30k words on
> it -- otherwise too many typos are marked as "correct" because they are
> actually a correctly spelled word that is infrequently used.

-- 
Kevin Atkinson
address@hidden
http://sunsite.unc.edu/kevina/

---
Note: This message was origanlly posted to address@hidden,
      not address@hidden




reply via email to

[Prev in Thread] Current Thread [Next in Thread]