[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [O] orgmode and pdf
From: |
Jambunathan K |
Subject: |
Re: [O] orgmode and pdf |
Date: |
Tue, 24 Jul 2012 17:53:22 +0530 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.1 (windows-nt) |
address@hidden writes:
> Hi list.
> I try to make a workflow to mine data from pdfs into org mode.
> I prefer to read in emacs, since I have fast dictionary lookup in it and
> many other things.
> There are two tools I think useful for conversion of pdfs into txt:
> cuneiform - to extract text, and pdfimages for image extraction.
> Cuneiform is better then other text extractors (what I have tried) in
> handling two columned
> pdfs.
PdfEdit seems interesting as well.
http://sourceforge.net/projects/pdfedit
http://www.cs.unb.ca/~bremner/blog/posts/pdf2text/
ps: I have no experience using PdfEdit or how it fares wrt images and
captions.
> A pdf as split to pages and each of them processed separateddly
> Using this two programs and some scripting I believe it is possible to
> convert pdf in org file. However there are two issues I would like to
> solve.
> 1) Is there any way to extract figure captions from a pdf?
> 2) I have no solution for formulas and Greek letters. The only way to
> handle it would be
> to consult an image of the page.
> Any suggestions about it? Have somebody tried something similar.
> Thanks.
> Petro.
>
>
>
>
>
--