[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-ocrad] Ocrad on Windows
From: |
skip |
Subject: |
[Bug-ocrad] Ocrad on Windows |
Date: |
Thu, 10 Aug 2006 10:09:35 -0500 |
I'm one of the developers of SpamBayes (http://www.spambayes.org/). A
frequent source of spam these days are messages with essentially no text (or
random gibberish) and one or more GIF images containing a pitch for
cheap pharmaceuticals or penny stocks.
I recently added code to use Ocrad to extract the text from these images:
http://mail.python.org/pipermail/spambayes-dev/2006-August/003697.html
http://mail.python.org/pipermail/spambayes-dev/2006-August/003699.html
http://mail.python.org/pipermail/spambayes-dev/2006-August/003715.html
Even though ocrad doesn't do a great job at extracting human-readable text
from these images, it does a good enough job, and I expect it will get
better over time. For this technique to be broadly useful in the SpamBayes
community, it will need to be available on Windows. A couple developers
have compiled ocrad on Windows using cygwin with one small code change
("std::fprintf" -> "fprintf"). Can we distribute that executable on the
SpamBayes SF site (or convince you to do so) so that we can get Windows
users to test out my new additions?
Related to that, is there any interest in making an OCR library which can be
linked into other applications instead of requiring the program to be run?
Thanks,
--
Skip Montanaro - address@hidden - http://www.mojam.com/
"On the academic side, effort is too often expended on finding precise
answers to the wrong questions." Baxter & Rennie, in "Financial Calculus"
- [Bug-ocrad] Ocrad on Windows,
skip <=