aleader-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Aleader-dev] Re: [Jocr-users] subtitles / mplayer


From: Joshua N Pritikin
Subject: [Aleader-dev] Re: [Jocr-users] subtitles / mplayer
Date: Thu, 20 Nov 2003 18:40:09 +0530
User-agent: Mutt/1.4i

----- Forwarded message from Tuukka Toivonen <address@hidden> -----

From: Tuukka Toivonen <address@hidden>
To: Joerg <address@hidden>
Cc: address@hidden
Subject: Re: [Jocr-users] subtitles / mplayer

On Wed, 19 Nov 2003, Joerg wrote:

>> Since I want this working soon, I'm planning to first simply forking and
>> executing gocr in a pipe. But it would be nice if this could be achieve
>Ups! here your mail is aborted by:
>[Formatting error: Non-hexadecimal character in QP encoding: message
>number 93]

Some stupid software in way, I guess... here's the rest of my message:

>>>>>
Since I want this working soon, I'm planning to first simply forking and
executing gocr in a pipe. But it would be nice if this could be achieved
more efficiently.

I captured some segmented subtitles and gocr on it by hand and the results
were promising: not a single misrecognized letter containing even A* and O*.
There were some spurious spaces in middle of words, though.
This isn't even DVD (what <address@hidden> asked) but a TV capture.

I'm using Debian 3.0 which has libgocr-0.7.2-4 and gocr-0.3.4-10.
<<<<

Originally A* and O* were a and o with two dots on top of them.

Also a status update: I have now implemented calling gocr with a simple
system() call and writing the images into a temporary file... works
surprisingly well. I can now extract subtitles and for most part they are
even readable. There are lots of extra spaces inside words but they don't
make reading that difficult really, and it would by easy to fix by hand.

>From a 90 min movie there are some 50 kB of extracted subtitles, so fixing
it by hand is not so difficult, especially since the start-of-display and
end-of-display times are almost always correct. Playing back the extracted
subtitles with mplayer works fine too.

However, it's still far from perfect. Especially on white background it
goes very badly...but I think it's quite nice for the first try. I'll clean
up the code and add some improvements and try to release a patch against
mplayer next week.

[Interesting note: although the recognizion result would be very bad if it
would be actual text, the errors with subtitles aren't so bad because one
can usually understand enough to follow the movie, and guess badly
recognized words from movie context]


Jocr-users mailing list
address@hidden
https://lists.sourceforge.net/lists/listinfo/jocr-users




reply via email to

[Prev in Thread] Current Thread [Next in Thread]