bug-ocrad
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-ocrad] Technical documentation summary readme.txt, page skew, H


From: Antonio Diaz Diaz
Subject: Re: [Bug-ocrad] Technical documentation summary readme.txt, page skew, Hough transform.
Date: Sat, 04 Mar 2006 15:21:04 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.7.12) Gecko/20050923

Chris K. Skinner wrote:
As you are probably aware, there are software patents in some countries.

Yes, in almost every country with a corrupt and/or fascist government.


If you had some kind of outline of the algorithms that were applied per version of the software that would greatly help someone new coming in fresh off the street to gain a quicker understanding of stuff in general, and probably demonstrate to the world at large that you have invented something new that could not be patented / stolen / claimed by some greedy corporate dudes.

I sympathize with your idea, but I lack the time and the ability to explain the algorithms I use or invent. On the other hand, a patent is valid even if I invented it independently, so it won't be an effective defense.


Do you have any design notes, bibliographic citations, web links to information that you've made use of , release notes for what algorithms are being used / abandoned.

The short answer is no. I have looked into the source of gocr and claraocr, but I haven't got anything from them. I use the Otsu algorithm for binarization (as gocr does). Apart from this, I work mostly in a vacuum.


In the J. R. Parker book w/CD ROM "Algorithms For Image Processing And Computer Vision" that I have read, the author provides a couple of algorithm suggestions for combating the page skew angle issue. A Hough-transform when applied to the dots of the bottoms of the bounding boxes of glyphs results in a page skew angle in degrees (with his source code, that is).

This has a number of problems:
- Hough-transform, and in general any transformation on the whole image, is slow as hell.
- What if the line is not skewed but curved? (frequent in scanned books).
- "The bottoms of the bounding boxes of glyphs" are usually not aligned.
- etc...

This is why I expect working code, not suggestions, from possible collaborators. (Show me the code, you know?) ;-)

By the way, ocrad's algorithms are designed to be resistant to page skew.


Another approach is to use angle-independent Complex-Number-Coefficient Neural Networks to use as feature recognizers. The Japanese promoter of these neural networks says that they are Affine-Transform insensitive, and thereby can recognize a pattern that has been so transformed.

I would like to see this recognizing a page in less than five minutes with good accuracy.


This too is just a theory. I don't have a copy of any books on Complex-Number-Coefficient Neural Networks, or any source code from a competent mathematician who has converted the advanced mathematics into working C++ code examples.

Don't worry. "Advanced mathematics" are the sofware of the future... and they will always be. ;-)


Regards,
Antonio.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]