Re: [Bug-ocrad] A few ocrad problems

bug-ocrad

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-ocrad] A few ocrad problems

From:	Don Moir
Subject:	Re: [Bug-ocrad] A few ocrad problems
Date:	Tue, 11 Jun 2013 08:48:25 -0400

Hi Antonio,

4) Failure to detect merged ti, vi, im, ll,  in merged_ti_vi_im_ll.pbm
This one is the most difficult. The last version of ocrad has fixed some problems like those, but there are lots of them (evenwith more than two letters merged). I'll try to fix as many as I can, but I don't promise anything.

I am pretty new to OCR, but I tested several OCR programs before I tried using ocrad. Now for my purposes ocrad has worked betterbut the space problem and merged characters are a problem (not just for ocrad of course).

I like the idea of feature extraction, but I am thinking I would like to generate the feature/rule set either on the fly or indatabase files. It seems to me that this could aid in the detection of merged characters. Like we absolutely know what a cap T lookslike and a merged TT is just 2 cap T's together. I don't know about this as I have not looked into it to much. But some charactersets may be like old english style and those might be hard to recognize with a fixed rule set.


Hopefully I will eventually have the time to look at it in more detail.

Thanks,

Don

----- Original Message -----From: "Antonio Diaz Diaz" <address@hidden>

To: "Don Moir" <address@hidden>
Cc: <address@hidden>
Sent: Sunday, June 02, 2013 3:46 PM
Subject: Re: [Bug-ocrad] A few ocrad problems

Hello Don.

Don Moir wrote:
Results so far look good and better than other other sources I have tried.
Thanks for the feedback.
1) An orphan capitial letter I fails to be detected.
Textline::recognize2 is supposed to become some kind of expert system for post-processing of recognized text, but it yet lacks alot of "rules". This is one of them. I'll add rules for isolated 'I' and 'UP' in the next release of ocrad.
3) Failure to detect a space character in latin_space.pbm.
This one is a little trickier. Currently ocrad measures the distance between "character boxes". It should measure the distancebetween the black blobs inside those character boxes, but this is more difficult to do. I plan to fix this in a future version ofocrad.
4) Failure to detect merged ti, vi, im, ll,  in merged_ti_vi_im_ll.pbm
This one is the most difficult. The last version of ocrad has fixed some problems like those, but there are lots of them (evenwith more than two letters merged). I'll try to fix as many as I can, but I don't promise anything.
The attached zip contains 6 files:
Next time, please, send the images to my email address, not to the list. Thanks.
I am wondering if possible merged characters should be added as special characters. like TT, ti, etc so then in future it's easyto add such combinations.
I have in fact removed some such combinations from the last version of ocrad. There are just too many of them and trying torecognize them worsens recognition results.
Best regards,
Antonio.

[Prev in Thread]

Current Thread

[Next in Thread]

[Bug-ocrad] A few ocrad problems, Don Moir, 2013/06/01
- Re: [Bug-ocrad] A few ocrad problems, Antonio Diaz Diaz, 2013/06/02
  - Re: [Bug-ocrad] A few ocrad problems, Don Moir <=

Prev by Date: Re: [Bug-ocrad] A few ocrad problems
Next by Date: [Bug-ocrad] Version 0.22-rc3 of GNU Ocrad released
Previous by thread: Re: [Bug-ocrad] A few ocrad problems
Next by thread: [Bug-ocrad] Version 0.22-rc3 of GNU Ocrad released
Index(es):
- Date
- Thread