[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Groff] pdfmom grep (was parallel text processing)
From: |
Ralph Corderoy |
Subject: |
Re: [Groff] pdfmom grep (was parallel text processing) |
Date: |
Sun, 10 Sep 2017 11:12:21 +0100 |
Hi Peter,
> The pipeline in the current pdfmom is actually
>
> groff -Tpdf -dLABEL.REFS=1 -mom -z $preconv $cmdstring 2>&1 |
> grep '^\\. *ds' |
> groff -Tpdf -dPDF.EXPORT=1 -dLABEL.REFS=1 -mom -z - $preconv $cmdstring
> 2>&1 |
> grep '^\\. *ds' |
> groff -Tpdf -mom $preconv - $cmdstring
...
> ***pdfmom pipeline entered literally at the command line
> groff -Tpdf -dLABEL.REFS=1 -mom -z -k camus.mom 2>&1 | \
> grep '^\. *ds' | \
> groff -Tpdf -dPDF.EXPORT=1 -dLABEL.REFS=1 -mom -z -k - camus.mom 2>&1 | \
> grep '^\. *ds' | \
> groff -Tpdf -mom -k - camus.mom > camus.pdf
> - grep does not report a binary file hit
The middle groff's `-k -' are swapped, but I don't think that affects
anything. (BTW, the backslashes aren't needed after a pipe; by design,
that indicates the line continues.)
> ***pdfmom itself at the command line
> pdfmom -k camus.mom > camus.pdf
> - grep reports a binary file hit
>
> strace on 'pdfmom -k camus.mom > camus. pdf' produces
I've neatened this up a bit, and show non-zero exits, and a SIGPIPE.
> pdfmom -k camus.mom
> sh -c groff -Tpdf -dLABEL.REFS=1 -mom ...
> groff -Tpdf -dLABEL.REFS=1 -mom -z -k camus.mom
> exit(1) grep ^\\. *ds
> groff -Tpdf -dPDF.EXPORT=1 -dLABEL.REFS=1 -mom -z - -k camus.mom
> grep ^\\. *ds
> groff -Tpdf -mom -k - camus.mom
> preconv - camus.mom
> PIPE troff -dPDF.EXPORT=1 -dLABEL.REFS=1 -mom -z -Tpdf
> preconv camus.mom
> troff -dLABEL.REFS=1 -mom -z -Tpdf
> troff -mom -Tpdf
> preconv - camus.mom
> gropdf
Let's go through it a step at a time to see if I can get across the
problem. Back to pdfmom...
> groff -Tpdf -dLABEL.REFS=1 -mom -z $preconv $cmdstring 2>&1 |
> grep '^\\. *ds' |
> groff -Tpdf -dPDF.EXPORT=1 -dLABEL.REFS=1 -mom -z - $preconv $cmdstring
> 2>&1 |
> grep '^\\. *ds' |
> groff -Tpdf -mom $preconv - $cmdstring
I run that manually.
$ preconv=-k
$ cmdstring=camus.mom
$ groff -Tpdf -dLABEL.REFS=1 -mom -z $preconv $cmdstring 2>&1
camus.mom:18: can't translate character code 233 to special
character `'e' in transparent throughput
$
There's no /^\.ds/ in that output, explaining why the first grep in
strace's output exit'd 1, so the stdin to the second groff is empty and
it's as if the first groff and grep didn't exist in this case. Onto the
second groff.
$ groff -Tpdf -dPDF.EXPORT=1 -dLABEL.REFS=1 -mom -z - $preconv $cmdstring
2>&1
^D^D^D^D
.ds pdf:look(pdf:bm1) L'�tranger
camus.mom:18: can't translate character code 233 to special
character `'e' in transparent throughput
$
(Bizarre I had to type the TTY's eof four times before groff stopped
trying to read.)
Here's the problem. Whatever is producing that `.ds' line is writing
ISO 8859-1, and my UTF-8 terminal rightly replaces it with `�', U+FFFD.
We're being told there was a problem too, in both this groff and the
previous one, with the `can't translate' warning. Decimal 233 is U+E9
that's the `é' in
$ grep TITLE camus.mom
.TITLE "L'étranger
This non-UTF-8 is fed into the second grep for /^\.ds/. It's in your
UTF-8 locale and correctly says standard input, containing binary,
matches rather than passing on the `.ds' line.
To investigate why it doesn't occur when you run the pipeline manually,
insert tee(1)s to snaffle the mid-pipeline data, or simply start with
the first command and tack on one more command on each subsequent run.
Have you a ~/bin/grep that alters the locale?
--
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy
- Re: [Groff] parallel text processing ; vertical and horizontal mode, (continued)
- Re: [Groff] parallel text processing ; vertical and horizontal mode, Ralph Corderoy, 2017/09/08
- [Groff] pdfmom grep (was parallel text processing), Peter Schaffter, 2017/09/08
- Re: [Groff] pdfmom grep (was parallel text processing), Steffen Nurpmeso, 2017/09/08
- Re: [Groff] pdfmom grep (was parallel text processing), Steffen Nurpmeso, 2017/09/08
- Re: [Groff] pdfmom grep (was parallel text processing), Steffen Nurpmeso, 2017/09/08
- Re: [Groff] pdfmom grep (was parallel text processing), Peter Schaffter, 2017/09/08
- Re: [Groff] pdfmom grep (was parallel text processing), Ralph Corderoy, 2017/09/09
- Re: [Groff] pdfmom grep (was parallel text processing), Peter Schaffter, 2017/09/09
- Re: [Groff] pdfmom grep (was parallel text processing), Ralph Corderoy, 2017/09/09
- Re: [Groff] pdfmom grep (was parallel text processing), Peter Schaffter, 2017/09/09
- Re: [Groff] pdfmom grep (was parallel text processing),
Ralph Corderoy <=
- Re: [Groff] pdfmom grep (was parallel text processing), Peter Schaffter, 2017/09/10
- Re: [Groff] pdfmom grep (was parallel text processing), Deri James, 2017/09/09
- Re: [Groff] parallel text processing ; vertical and horizontal mode, Ralph Corderoy, 2017/09/07
- Re: [Groff] parallel text processing ; vertical and horizontal mode, Mike Bianchi, 2017/09/07
- Re: [Groff] parallel text processing ; vertical and horizontal mode, Ted Harding, 2017/09/06
- Re: [Groff] parallel text processing ; vertical and horizontal mode, Larry Kollar, 2017/09/13