[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Aspell-user] digit behavior
From: |
Michael Howard |
Subject: |
[Aspell-user] digit behavior |
Date: |
Sat, 1 May 2010 13:57:29 -0400 |
I am investigating aspell for use on a large set of scanned pages with
text that was generated through OCR.
I searched through the mailing list achiive and found
http://lists.gnu.org/archive/html/aspell-user/2002-07/msg00003.html
wherein Kevin Atkinson explains that aspell was not designed for
OCR-type errors.
Nevertheless, I chose to proceed a bit ... primarly because I was
unable to find anything open source that was better. Unfortunately I
did not get very far.
aspell seems to ignore any words with digits in them, and my OCR text
has plenty of digit/character confusion. I was unable to find any
options to control behavior with digits.
Searching the mailing list again I found
http://lists.gnu.org/archive/html/aspell-user/2006-08/msg00013.html
wherein Thomas Güttler suggested modifying the cset table so that
additional characters could be treated as word characters. I tried
copying the .cset file, modifying it to turn the Digits into Letters,
specifyiing my cset using --encoding on the command line. However but
the behavior did not change ... words with digits in them were still
ignored and did not show up with --list.
Any comments/suggestions/advice appreciated.
Michael
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Aspell-user] digit behavior,
Michael Howard <=