[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [aspell-user] Aspell and OCR-generated text
From: |
Kevin Atkinson |
Subject: |
Re: [aspell-user] Aspell and OCR-generated text |
Date: |
Fri Jul 12 16:10:23 2002 |
On Thu, 11 Jul 2002 address@hidden wrote:
> ... It would be much better, though, if aspell's
> algorithms were oriented toward the kinds of mistakes OCR engines make
> rather than the kinds made by human typists.
Aspell algorithms are not really tuned for the type of mistakes made by
typists. Rather they are tuned for the type of mistakes humans
(especially me) tend to make when trying to spell a word. The typo
analysis in Aspell biases the result slightly, but it generally doesn't
make a huge difference.
> I can see how you might do this
> by working with the translation tables for the phonetic code, the keyboard
> files, etc.
You will probably get the best results by turnings the soundslike analysis
off all together. Modifying the keyboard file will also help. However,
the best results will probably be from modifying the weights in
TypoEditDistanceWeights found in util/typo_editdist.hh. To do so will
requiring modifying the code a bit. The code that fills in the weight can
be found in SuggestParms::fill_distance_lookup in lib/suggest.cc. Most of
the code should be self explanatory.
You really need to understand how edit distance works in order to know
what to modify. The comments in the util/*editdist* files should give you
enough information for this understanding.
---
http://kevin.atkinson.dhs.org