emacs-elpa-diffs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[elpa] externals/eev 92702c7 49/64: Made `find-pdf-text' ignore spurious


From: Stefan Monnier
Subject: [elpa] externals/eev 92702c7 49/64: Made `find-pdf-text' ignore spurious formfeeds.
Date: Sun, 7 Apr 2019 16:59:11 -0400 (EDT)

branch: externals/eev
commit 92702c742df913d4094cdffe60640a3917bceca5
Author: Eduardo Ochs <address@hidden>
Commit: Eduardo Ochs <address@hidden>

    Made `find-pdf-text' ignore spurious formfeeds.
---
 ChangeLog      |  18 ++++-
 VERSION        |   4 +-
 eev-codings.el |  26 ++++++-
 eev-intro.el   | 235 ++++++++++++++++++++++++++++++++++-----------------------
 eev-pdflike.el |  32 ++++++--
 eev-wrap.el    |   2 +
 6 files changed, 208 insertions(+), 109 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index e153724..9d620b1 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,10 +1,24 @@
+2019-03-05  Eduardo Ochs  <address@hidden>
+
+       * eev-intro.el (find-eev-quick-intro): added material in the
+       sections about links to PDFs.
+
+2019-03-04  Eduardo Ochs  <address@hidden>
+
+       * eev-pdflike.el (ee-pdftotext-replace-bad-ffs): new function.
+       (find-sh-page): use `find-callprocess00' and
+       `ee-pdftotext-replace-bad-ffs'.
+       (ee-find-pdf-text): return a list instead of a string.
+       (ee-find-pdftotext-text): return a list instead of a string.
+
 2019-03-03  Eduardo Ochs  <address@hidden>
 
+       * eejump.el: rewrote most comments, deleted some `eejump-<nnn>'s, and 
made
+       `eejump-6': point to (find-escripts-intro).
+
        * eev-elinks.el (ee-find-intro-links): set the correct default
        value for `stem'.
 
-       * eejump.el (eejump-6): point to (find-escripts-intro).
-
 2019-03-02  Eduardo Ochs  <address@hidden>
 
        * eev-anchors.el: converted to utf-8.
diff --git a/VERSION b/VERSION
index 00ed0a0..4a3d8a9 100644
--- a/VERSION
+++ b/VERSION
@@ -1,2 +1,2 @@
-Mon Mar  4 00:46:47 GMT 2019
-Sun Mar  3 21:46:47 -03 2019
+Tue Mar  5 03:20:28 GMT 2019
+Tue Mar  5 00:20:28 -03 2019
diff --git a/eev-codings.el b/eev-codings.el
index 468a905..2f9b0a6 100644
--- a/eev-codings.el
+++ b/eev-codings.el
@@ -19,7 +19,7 @@
 ;;
 ;; Author:     Eduardo Ochs <address@hidden>
 ;; Maintainer: Eduardo Ochs <address@hidden>
-;; Version:    2019feb24
+;; Version:    2019mar04
 ;; Keywords:   e-scripts
 ;;
 ;; Latest version: <http://angg.twu.net/eev-current/eev-coding.el>
@@ -37,7 +37,7 @@
 ;; files; the functions defined here make the local variables section
 ;; trick unneccessary - `ee-format-as-anchor' now uses `ee-tolatin1'
 ;; to produce a search string that works both unibyte, on UTF-8, on
-;; latin-1 files and some (most?) other encodings.
+;; latin-1 files and some (most of?) other encodings.
 ;;
 ;; NOTE: `ee-tolatin1' a hack! Conversion to latin-1 seems to work in
 ;; most cases, but I don't understand very well the reasons why... I
@@ -52,6 +52,28 @@
 ;;   http://angg.twu.net/e/emacs.e.html#unibyte-2019-search
 ;;   http://angg.twu.net/e/emacs.e.html#creating-utf8-files
 ;;   http://angg.twu.net/e/emacs.e.html#ee-re-to
+;;
+;;
+;; NOTE 2: Sorry for taking so long!! Here's what happened. This page
+;;
+;;   http://angg.twu.net/glyphs.html
+;;
+;; tells a bit about the hacked 256-char fonts that I created many
+;; years before UTF-8 became standard, and that I used for ages in
+;; some of my notes and .tex files... I wanted to maintain
+;; compatibility with the files that used those fonts, and this turned
+;; out to be very hard - these hacked fonts only worked in files and
+;; buffers in which the encoding was "raw-text",
+;;
+;;   (find-elnode "Non-ASCII Characters")
+;;   (find-elnode "Disabling Multibyte" "unibyte")
+;;   (find-elnode "Disabling Multibyte" "raw-text")
+;;
+;; and before 2019 I had a *very* poor understanding of how Emacs
+;; converts between unibyte and multibyte and between raw-text,
+;; latin-1 and utf-8...
+
+
 
 ;; «.ee-tolatin1»      (to "ee-tolatin1")
 ;; «.ee-tolatin1-re»   (to "ee-tolatin1-re")
diff --git a/eev-intro.el b/eev-intro.el
index 5fb50fe..b7ca4e7 100644
--- a/eev-intro.el
+++ b/eev-intro.el
@@ -20,7 +20,7 @@
 ;;
 ;; Author:     Eduardo Ochs <address@hidden>
 ;; Maintainer: Eduardo Ochs <address@hidden>
-;; Version:    2019mar03
+;; Version:    2019mar05
 ;; Keywords:   e-scripts
 ;;
 ;; Latest version: <http://angg.twu.net/eev-current/eev-intro.el>
@@ -1396,6 +1396,12 @@ If you run these sexps
 
 then these hyperlinks should work:
 
+  (find-livesofanimalspage)
+  (find-livesofanimalstext)
+  (find-livesofanimalspage (+ -110 113))
+  (find-livesofanimalstext (+ -110 113))
+  (find-livesofanimalspage (+ -110 113) \"LECTURE I.\")
+  (find-livesofanimalstext (+ -110 113) \"LECTURE I.\")
   (find-livesofanimalspage (+ -110 127) \"wrong thoughts\")
   (find-livesofanimalstext (+ -110 127) \"wrong thoughts\")
   (find-livesofanimalspage (+ -110 132) \"into the place of their victims\")
@@ -1413,7 +1419,46 @@ then these hyperlinks should work:
   (find-livesofanimalspage (+ -110 164) \"last common ground\")
   (find-livesofanimalstext (+ -110 164) \"last common ground\")
 
-[To do: explain them]
+The sexps like `(+ -110 113)' are a bit mysterious at first
+sight. We are accessing a PDF that is an excerpt of a book. The
+third page of the PDF has a \"[113]\" at its footer to indicate
+that it is the page 113 of the book. Let's use the terms _page
+number_ and _page label_ to distinguish the two numberings: in
+this case, the page whose page number is 3 is the page whose page
+label is 113. These two sexps
+
+  (find-livesofanimalspage (+ -110 113))
+  (find-livesofanimalspage 3)
+
+are equivalent, but the first one is more human-friendly: the 113
+is a page label, and the -110 is adjustment (we call it the
+\"offset\") to convert the 113 that humans prefer to see intto
+the 3 that xpdf needs to receive.
+
+Note that the sexp
+
+  (find-livesofanimalstext 3)
+
+converts the PDF of the \"Lives of Animals\" book to text and
+goes to \"page 3\" on it by counting formfeeds from the beginning
+of the buffer, as explained here:
+
+  (find-enode \"Pages\" \"formfeed\")
+
+In this pairs of sexps,
+
+  (find-livesofanimalspage (+ -110 113) \"LECTURE I.\")
+  (find-livesofanimalstext (+ -110 113) \"LECTURE I.\")
+
+the first one goes to page 3 of the PDF and ignores the string
+\"LECTURE I.\" (that is there just for humans, as a reminder of
+what is important in that page); the second sexp goes to the page
+3 of the PDF converted to text, searches for the string \"LECTURE
+I.\" and places the cursor right after the end of it.
+
+In section 10.3 we will see how to generate with just a few
+keystrokes a short hyperlink to a page of a PDF and a short
+hyperlink to a string in a page of a PDF.
 
 
 
@@ -1473,14 +1518,89 @@ that will run something similar to:
 
   (find-einfo-links \"(elisp)Top\")
 
+The code that produces the short hyperlink to an info node is not
+currently very smart. If you look at the definition of
+`find-elnode' here
+
+  (find-code-c-d \"el\" ee-emacs-lisp-directory \"elisp\")
+
+you will see that it saves the \"el\" and the \"elisp\" in global
+variables by running this:
+
+  (setq ee-info-code \"el\")
+  (setq ee-info-file \"elisp\")
+
+The short hyperlink to an info node is only produced when Info is
+visting a node in a manual whose name matches the variable
+`ee-info-file'.
+
 
 
 
 10.3. Generating short hyperlinks to intros
 -------------------------------------------
+Let's see an example. If you follow this link and type `M-h M-h',
+
+  (find-multiwindow-intro)
+
+you will get an \"*Elisp hyperlinks*\" buffer whose last line
+will be:
+
+  # (find-multiwindow-intro)
+
+which is a short hyperlink to the intro.
+
+
+
 
 10.3. Generating short hyperlinks to PDFs
 -----------------------------------------
+We saw in sections 9.3 and 9.4 that after the right preparations
+the first of these hyperlinks
+
+  (find-livesofanimalspage (+ -110 134) \"woke up haggard in the mornings\")
+  (find-livesofanimalstext (+ -110 134) \"woke up haggard in the mornings\")
+
+opens a PDF in a certain page using xpdf, and the second one
+opens in an Emacs buffer the result of converting that PDF to
+text, goes to a certain page in it an searches for a string.
+
+It is difficult to make xpdf send information to Emacs, so this
+trick uses the second link. Run this,
+
+  (find-livesofanimalstext (+ -110 134) \"woke up haggard in the mornings\")
+
+mark a piece of text in it - for example, the \"no punishment\"
+in the end of the first paragraph - and copy it to the kill ring
+with `M-w'. Then type `M-h M-p' (`find-pdflike-page-links'); note
+that `M-h M-h' won't work here because `find-here-links' is not
+smart enough to detect that we are on a PDF converted to text.
+You will get an \"*Elisp hyperlinks*\" buffer that contains these
+links:
+
+  # (find-livesofanimalspage 24)
+  # (find-livesofanimalstext 24)
+  # (find-livesofanimalspage (+ -110 134))
+  # (find-livesofanimalstext (+ -110 134))
+
+  # (find-livesofanimalspage 24 \"no punishment\")
+  # (find-livesofanimalstext 24 \"no punishment\")
+  # (find-livesofanimalspage (+ -110 134) \"no punishment\")
+  # (find-livesofanimalstext (+ -110 134) \"no punishment\")
+
+Remember that we called `code-pdf-page' and `code-pdf-text' as:
+
+  (code-pdf-page \"livesofanimals\" l-o-a)
+  (code-pdf-text \"livesofanimals\" l-o-a -110)
+
+The extra argument \"-110\" to `code-pdf-text' tells `M-h M-p' to
+used \"-110\" as the offset.
+
+
+
+
+10.4. Generating short hyperlinks to anchors
+--------------------------------------------
 
 
 
@@ -6024,113 +6144,36 @@ For more information see:
 \(Re)generate: (find-templates-intro)
 Source code:  (find-eev \"eev-intro.el\" \"find-templates-intro\")
 More intros:  (find-eev-quick-intro)
-              (find-eval-intro)
-              (find-eepitch-intro)
+              (find-escripts-intro)
+              (find-links-conv-intro)
+              (find-eev-intro)
 This buffer is _temporary_ and _editable_.
 Is is meant as both a tutorial and a sandbox.
 
 
-`ee-template0'
-==============
-\(find-efunctiondescr 'ee-template0)
-\(find-efunction      'ee-template0)
-
-
-`ee-H', `ee-S', `ee-HS'
-=======================
-
-
-
-`find-find-links-links'
-=======================
-\(find-links-intro)
-\(find-find-links-links)
-\(find-efunction 'ee-stuff-around-point)
-interactive
-
-
-`find-elinks'
-=============
-\(find-efunction 'find-elinks)
-
-
-
- (find-intro-links)
-\(find-eev \"eev-tlinks.el\" \"find-intro-links\")
-\(find-eevfile \"eev-tlinks.el\")
-
-
 
-The innards: templates
-======================
-Several functions in eev besides `code-c-d' work by replacing
-some substrings in \"templates\"; they all involve calls to
-either the function `ee-template0', which is simpler, or to
-`ee-template', which is much more complex.
-
-The function `ee-template0' receives a single argument - a
-string, in which each substring surrounded by `{...}'s is to be
-replaced, and replaces each `{...}' by the result of evaluating
-the `...' in it. For example:
-
-  (ee-template0 \"a{(+ 2 3)}b\")
-            --> \"a5b\"
+This into is currently GARBAGE.
+It should be rewritten to become a tutorial on:
 
-Usually the contents of each `{...}' is the name of a variable,
-and when the result of evaluating a `{...}' is a string the
-replacement does not get `\"\"'s.
+  1) How to use `ee-template0' and `find-elinks':
 
-The function `ee-template' receives two arguments, a list and a
-template string, and the list describes which `{...}' are to be
-replaced in the template string, and by what. For example, here,
+      (find-eev \"eev-wrap.el\" \"ee-template0\")
+      (find-eev \"eev-elinks.el\" \"find-elinks\")
 
-  (let ((a \"AA\")
-        (b \"BB\"))
-    (ee-template '(a
-                   b
-                   (c \"CC\"))
-      \"_{a}_{b}_{c}_{d}_\"))
+  2) A review of the conventions here:
 
-      --> \"_AA_BB_CC_{d}_\"
+      (find-links-conv-intro)
+      (find-links-conv-intro \"3. Classification\")
 
-the \"{d}\" is not replaced. Note that the list (a b (c \"CC\"))
-contains some variables - which get replaced by their values -
-and a pair, that specifies explicitly that every \"{c}\" should
-be replaced by \"CC\".
+  3) How some template functions like these
 
+      (find-eev \"eev-tlinks.el\" \"find-find-links-links\")
+      (find-eev \"eev-tlinks.el\" \"find-intro-links\")
+      (find-eev \"eev-wrap.el\" \"find-eewrap-links\")
 
+    are used to create first versions for several functions in
+    eev...
 
-
-Templated buffers
-=================
-Introduction
-Conventions:
-  the first line regenerates the buffer,
-  buffer names with \"**\"s,
-  (find-evariable 'ee-buffer-name)
-  code
-
-`find-elinks'
-=============
-Variant: `find-elinks-elisp'
-
-`find-e*-links'
-===============
-\(find-eev \"eev-elinks.el\")
-
-`find-*-intro'
-==============
-
-`eewrap-*'
-==========
-
-Experiments
-===========
-\(find-efunction 'find-youtubedl-links)
-\(find-efunction 'ee-hyperlinks-prefix)
-\(find-efunction 'find-newhost-links)
-\(find-efunction 'find-eface-links)
-  Note that there is no undo.
 " rest)))
 
 ;; (find-templates-intro)
diff --git a/eev-pdflike.el b/eev-pdflike.el
index 8e193c4..1ce2477 100644
--- a/eev-pdflike.el
+++ b/eev-pdflike.el
@@ -19,7 +19,7 @@
 ;;
 ;; Author:     Eduardo Ochs <address@hidden>
 ;; Maintainer: Eduardo Ochs <address@hidden>
-;; Version:    2019mar02
+;; Version:    2019mar04
 ;; Keywords:   e-scripts
 ;;
 ;; Latest version: <http://angg.twu.net/eev-current/eev-pdflike.el>
@@ -173,13 +173,25 @@
          (t (error "This is not a valid pos-spec: %S" pos-spec)))
     (if rest (ee-goto-rest rest))))
 
+(defun ee-pdftotext-replace-bad-ffs (bigstr)
+"Convert formfeeds that are preceded by non-newline chars into something else.
+Sometimes pdftotext return \"spurious formfeeds\" that correspond
+not to page breaks but to special printable characters, and these
+spurious formfeeds confuse `ee-goto-position-page'. This function
+finds sequence of spurious formfeeds using a heuristic that works
+in most cases - formfeeds following something that is not a
+newline are spurious - and replaces them by \"(ff)\"."
+  (replace-regexp-in-string  
+   "\\([^\n\f]\\)\\(\f+\\)" "\\1(ff)" bigstr t))
+
 ;; «find-sh-page» (to ".find-sh-page")
-(defun find-sh-page (command &rest pos-spec-list)
-  "Like `find-sh', but interpreting the car of POS-SPEC-LIST as a page."
+(defun find-sh-page (program-and-args &rest pos-spec-list)
+  "Like `find-sh', but interpreting the car of POS-SPEC-LIST as a page number."
   (interactive "sShell command: ")
   (find-eoutput-reuse
-   command
-   `(insert (shell-command-to-string ,command)))
+   (ee-unsplit program-and-args)
+   `(insert (ee-pdftotext-replace-bad-ffs
+            (find-callprocess00 ,'program-and-args))))
   (apply 'ee-goto-position-page pos-spec-list))
 
 
@@ -323,11 +335,17 @@
 ;; (find-code-xxxpdftext-family "pdf-text")
         (code-xxxpdftext-family "pdf-text")
 
+;; (defun ee-find-pdf-text (fname)
+;;   (format "pdftotext -layout -enc Latin1 '%s' -" (ee-expand fname)))
+;; 
+;; (defun ee-find-pdftotext-text (fname)
+;;   (format "pdftotext -layout -enc Latin1 '%s' -" (ee-expand fname)))
+
 (defun ee-find-pdf-text (fname)
-  (format "pdftotext -layout -enc Latin1 '%s' -" (ee-expand fname)))
+  `("pdftotext" "-layout" "-enc" "Latin1" ,(ee-expand fname) "-"))
 
 (defun ee-find-pdftotext-text (fname)
-  (format "pdftotext -layout -enc Latin1 '%s' -" (ee-expand fname)))
+  `("pdftotext" "-layout" "-enc" "Latin1" ,(ee-expand fname) "-"))
 
 
 
diff --git a/eev-wrap.el b/eev-wrap.el
index cea487a..e0ab95c 100644
--- a/eev-wrap.el
+++ b/eev-wrap.el
@@ -40,6 +40,7 @@
 ;; «.ee-template0»             (to "ee-template0")
 ;; «.ee-S»                     (to "ee-S")
 ;; «.ee-this-line-wrapn»       (to "ee-this-line-wrapn")
+;; «.find-eewrap-links»                (to "find-eewrap-links")
 
 
 
@@ -554,6 +555,7 @@ cd     {dir}"))
 {<}(ee-HS `(find-{stem} ,{args})){>}\"))\n")))
 
 
+;; «find-eewrap-links» (to ".find-eewrap-links")
 ;; A more standard way to create `eewrap-*' functions.
 ;; (find-find-links-links "<none>" "eewrap" "C stem args")
 ;;



reply via email to

[Prev in Thread] Current Thread [Next in Thread]