[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: "transparent" output and throughput, demystified
From: |
Deri |
Subject: |
Re: "transparent" output and throughput, demystified |
Date: |
Sat, 31 Aug 2024 17:07:28 +0100 |
On Saturday, 31 August 2024 00:07:57 BST G. Branden Robinson wrote:
> It would be cleaner and simpler to provide a mechanism for processing a
> string directly, discarding escape sequences (like vertical motions or
> break points [with or without hyphenation). This point is even more
> emphatic because of the heavy representation of special characters in
> known use cases. That, to "sanitize" (or "pdfclean") such strings by
> round-tripping them through a process that converts a sequence of easily
> handled bytes like "\ [ 'a ]" or "\ [ u 0 4 1 1 ]" into a special
> character node and then back again seems wasteful and fragile to me.
Hi Branden,
This would be great, but I see some problems with the current code. Doing
this:-
[derij@pip build (master)]$ echo ".device \[u012F]"|./test-groff -Tpdf -Z |
grep "^x X"
x X \[u012F]
[derij@pip build (master)]$ echo "\X'\[u012F]'"|test-groff -Tpdf -Z | grep "^x
X"
x X \[u0069_032]
Shows that the \[u012F] has been decomposed (wrongly!) by \X. Whilst this
might make sense for the text stream since afmtodit keys the glyphs on the
decomposed unicode. I would love to know why we decompose, since none of our
fonts include combining diacritical mark glyphs so neither grops nor gropdf
have a chance to synthesise the glyphs from the constituent parts if it is
not present in the font! Given that the purpose of \X is to pass meta-data to
output drivers, which probably will convert it to utf-8 or utf16, it seems odd
to decompose the output from preconv (utf16) before passing to the output
driver, .device does not.
The correct decompose for 012F is 0069_0328, so it is just a string truncation
bug.
Just like you I would like to avoid "round-tripping", utf16 (preconv) ->
decomposed (troff) -> utf16 (gropdf). This does not currently affect grops
which does not support anything beyond 8bit ascii. Do you agree it makes more
sense for \X to pass \[u012F] rather than \[u0069_0328]?
Cheers
Deri