Hello Don.
Don Moir wrote:
Results so far look good and better than other other sources I have tried.
Thanks for the feedback.
1) An orphan capitial letter I fails to be detected.
Textline::recognize2 is supposed to become some kind of expert system for post-processing of recognized text, but it yet lacks a
lot of "rules". This is one of them. I'll add rules for isolated 'I' and 'UP' in the next release of ocrad.
3) Failure to detect a space character in latin_space.pbm.
This one is a little trickier. Currently ocrad measures the distance between "character boxes". It should measure the distance
between the black blobs inside those character boxes, but this is more difficult to do. I plan to fix this in a future version of
ocrad.
4) Failure to detect merged ti, vi, im, ll, in merged_ti_vi_im_ll.pbm
This one is the most difficult. The last version of ocrad has fixed some problems like those, but there are lots of them (even
with more than two letters merged). I'll try to fix as many as I can, but I don't promise anything.
The attached zip contains 6 files:
Next time, please, send the images to my email address, not to the list. Thanks.
I am wondering if possible merged characters should be added as special characters. like TT, ti, etc so then in future it's easy
to add such combinations.
I have in fact removed some such combinations from the last version of ocrad. There are just too many of them and trying to
recognize them worsens recognition results.
Best regards,
Antonio.