[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Help with a (query) replacement
From: |
Juan Manuel Macías |
Subject: |
Re: Help with a (query) replacement |
Date: |
Sat, 12 Nov 2022 16:04:01 +0000 |
Ypo writes:
> Thanks, Juan Manuel.
>
> I normally study using PDF books. Their typography is like
> "hardcoded", so a post-processing using Orgmode is needed, I think.
If it's a PDF then forget what I told you about pandoc, because here
pandoc would have nothing to do. I thought you were referring to files
in epub format, sorry.
In the case of PDFs, I would use pdftotext. It converts the PDF to plain
text and (in theory) removes hyphens from the PDF after conversion. The
resulting plain text is somewhat ugly (page numbers and other elements
are preserved), but if you just want to copy/paste text, I think it's
enough.
The command:
pdftotext my-file.pdf
https://man.archlinux.org/man/pdftotext.1.en
https://en.wikipedia.org/wiki/Pdftotext