Re: [O] orgmode and pdf

emacs-orgmode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [O] orgmode and pdf

From:	Jambunathan K
Subject:	Re: [O] orgmode and pdf
Date:	Tue, 24 Jul 2012 17:53:22 +0530
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/24.1 (windows-nt)

address@hidden writes:

> Hi list.
> I try to make a workflow to mine data from pdfs into org mode.
> I prefer to read in emacs, since I have fast dictionary lookup in it and
> many other things.
> There are two tools I think useful for conversion of pdfs into txt:
> cuneiform - to extract text, and pdfimages for image extraction.
> Cuneiform is better then other text extractors (what I have tried) in
> handling two columned
> pdfs.

PdfEdit seems interesting as well.

http://sourceforge.net/projects/pdfedit
http://www.cs.unb.ca/~bremner/blog/posts/pdf2text/

ps: I have no experience using PdfEdit or how it fares wrt images and
captions.

> A pdf as split to pages and each of them processed separateddly
> Using this two programs and some scripting I believe it is possible to
> convert pdf in org file. However there are two issues I would like to
> solve.
> 1) Is there any way to extract  figure captions from a pdf?
> 2) I have no solution for formulas and Greek letters. The only way to
> handle it would be
> to consult an image of the page.
> Any suggestions about it? Have somebody tried something similar. 
> Thanks.
> Petro.
>
>
>
>
>

--

[Prev in Thread]

Current Thread

[Next in Thread]

[O] orgmode and pdf, x . piter, 2012/07/24
- Re: [O] orgmode and pdf, Jambunathan K <=

Prev by Date: Re: [O] is there some example for parse org-mode file in emacs lisp script mode ?
Next by Date: Re: [O] [PATCH] fix documentation for org-clock-in-last
Previous by thread: [O] orgmode and pdf
Next by thread: [O] [bug] org-e-beamer misbehaves when case-fold-search is t
Index(es):
- Date
- Thread