[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [O] Org Mode and PDF Notes!
From: |
Ramon Diaz-Uriarte |
Subject: |
Re: [O] Org Mode and PDF Notes! |
Date: |
Fri, 13 Nov 2015 00:55:14 +0100 |
User-agent: |
mu4e 0.9.13; emacs 24.5.1 |
On Thu, 12-11-2015, at 15:28, Matt Lundin <address@hidden> wrote:
> Ramon Diaz-Uriarte <address@hidden> writes:
>
>>
>> so we get the location of the highlight (and its properties), but not the
>> textual contents. And this is the case whether I make the annotation with
>> EzPDF or Okular or, for that matter, with pdf-tools itself.
>>
>> So it seems RepliGO is actually giving you a lot more by default :-)
>>
>>>
>>> Politza and I are discussing this here:
>>> https://github.com/politza/pdf-tools/issues/137
>>>
>>> that might be a good place to ocntinue the conversation.
>>>
>>
>> I'll do. In the meantime, I think this is a limitation coming from
>> poppler. Other people have mentioned similar things (e.g.,
>> http://coda.caseykuhlman.com/entries/2014/pdf-extract.html) and using other
>> tools that depend on poppler (such as Leela:
>> https://github.com/TrilbyWhite/Leela) also will not give us the text
>> itself.
>
> I don't think this is a limitation of poppler so much as the way that
> pdf annotations work. Typically, the subject/text field is not populated
> by the text of the highlighted region. Rather, a highlight annotation
> specifies bounds, color, style, etc. Basically what Repligo does (I
> wouldn't recommend using it, as it is closed source and severely out of
> date) is to grab the text *at the time of highlighting* and add it to
> the notes field. I don't know of any other annotation tool that does the
> same thing. Applications built on poppler could do it, though they
> currently do not.
I stand corrected. You are right; sorry for the sloppiness in the wording
and ideas.
>
> For extracting the text of highlighted regions *after the fact*, I've
> had good luck with this script that relies on the pdf-reader gem for
> ruby:
>
> https://gist.github.com/danlucraft/5277732
That is also what I use for extracting the text from the highlighted
regions.
R.
>
> Matt
--
Ramon Diaz-Uriarte
Department of Biochemistry, Lab B-25
Facultad de Medicina
Universidad Autónoma de Madrid
Arzobispo Morcillo, 4
28029 Madrid
Spain
Phone: +34-91-497-2412
Email: address@hidden
address@hidden
http://ligarto.org/rdiaz
- Re: [O] Org Mode and PDF Notes!, (continued)
Re: [O] Org Mode and PDF Notes!, Karl Voit, 2015/11/12
Re: [O] Org Mode and PDF Notes!, Peter Davis, 2015/11/11
Re: [O] Org Mode and PDF Notes!, Matti Minkkinen, 2015/11/14