[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-ocrad] A few ocrad problems
From: |
Antonio Diaz Diaz |
Subject: |
Re: [Bug-ocrad] A few ocrad problems |
Date: |
Sun, 02 Jun 2013 21:46:36 +0200 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.7.11) Gecko/20050905 |
Hello Don.
Don Moir wrote:
Results so far look good and better than other other sources I have tried.
Thanks for the feedback.
1) An orphan capitial letter I fails to be detected.
Textline::recognize2 is supposed to become some kind of expert system
for post-processing of recognized text, but it yet lacks a lot of
"rules". This is one of them. I'll add rules for isolated 'I' and 'UP'
in the next release of ocrad.
3) Failure to detect a space character in latin_space.pbm.
This one is a little trickier. Currently ocrad measures the distance
between "character boxes". It should measure the distance between the
black blobs inside those character boxes, but this is more difficult to
do. I plan to fix this in a future version of ocrad.
4) Failure to detect merged ti, vi, im, ll, in merged_ti_vi_im_ll.pbm
This one is the most difficult. The last version of ocrad has fixed some
problems like those, but there are lots of them (even with more than two
letters merged). I'll try to fix as many as I can, but I don't promise
anything.
The attached zip contains 6 files:
Next time, please, send the images to my email address, not to the list.
Thanks.
I am wondering if possible merged characters should be added as special
characters. like TT, ti, etc so then in future it's easy to add such
combinations.
I have in fact removed some such combinations from the last version of
ocrad. There are just too many of them and trying to recognize them
worsens recognition results.
Best regards,
Antonio.