[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [O] Org Mode and PDF Notes!
From: |
Ramon Diaz-Uriarte |
Subject: |
Re: [O] Org Mode and PDF Notes! |
Date: |
Fri, 13 Nov 2015 00:51:41 +0100 |
User-agent: |
mu4e 0.9.13; emacs 24.5.1 |
On Thu, 12-11-2015, at 23:52, Matt Price <address@hidden> wrote:
> On Thu, Nov 12, 2015 at 9:28 AM, Matt Lundin <address@hidden> wrote:
>
>> Ramon Diaz-Uriarte <address@hidden> writes:
>> >
>> > I'll do. In the meantime, I think this is a limitation coming from
>> > poppler. Other people have mentioned similar things (e.g.,
>> > http://coda.caseykuhlman.com/entries/2014/pdf-extract.html) and using
>> other
>> > tools that depend on poppler (such as Leela:
>> > https://github.com/TrilbyWhite/Leela) also will not give us the text
>> > itself.
>>
>> I don't think this is a limitation of poppler so much as the way that
>> pdf annotations work. Typically, the subject/text field is not populated
>> by the text of the highlighted region. Rather, a highlight annotation
>> specifies bounds, color, style, etc. Basically what Repligo does (I
>> wouldn't recommend using it, as it is closed source and severely out of
>> date) is to grab the text *at the time of highlighting* and add it to
>> the notes field. I don't know of any other annotation tool that does the
>> same thing. Applications built on poppler could do it, though they
>> currently do not.
>>
>> For extracting the text of highlighted regions *after the fact*, I've
>> had good luck with this script that relies on the pdf-reader gem for
>> ruby:
>>
>> https://gist.github.com/danlucraft/5277732
>>
>> This looks interesting. It searches for file "./markup_receiver", but
> doesn't provide that file, which does not appear to be a gem. Any hints?
I think I got it from
https://www.omniref.com/github/danlucraft/pyranine/HEAD/files/lib/pyranine/markup_receiver.rb
>
> With politza's help am getting close to being able to extract annotation
> text from within pdf-tools, but am not quite there yet.
Neat!
R.
>
>
>> Matt
>>
--
Ramon Diaz-Uriarte
Department of Biochemistry, Lab B-25
Facultad de Medicina
Universidad Autónoma de Madrid
Arzobispo Morcillo, 4
28029 Madrid
Spain
Phone: +34-91-497-2412
Email: address@hidden
address@hidden
http://ligarto.org/rdiaz
- Re: [O] Org Mode and PDF Notes!, (continued)
Re: [O] Org Mode and PDF Notes!, Karl Voit, 2015/11/12
Re: [O] Org Mode and PDF Notes!, Peter Davis, 2015/11/11
Re: [O] Org Mode and PDF Notes!, Matti Minkkinen, 2015/11/14