groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: "transparent" output and throughput, demystified


From: Deri
Subject: Re: "transparent" output and throughput, demystified
Date: Sat, 31 Aug 2024 17:07:28 +0100

On Saturday, 31 August 2024 00:07:57 BST G. Branden Robinson wrote:
> It would be cleaner and simpler to provide a mechanism for processing a
> string directly, discarding escape sequences (like vertical motions or
> break points [with or without hyphenation).  This point is even more
> emphatic because of the heavy representation of special characters in
> known use cases.  That, to "sanitize" (or "pdfclean") such strings by
> round-tripping them through a process that converts a sequence of easily
> handled bytes like "\ [ 'a ]" or "\ [ u 0 4 1 1 ]" into a special
> character node and then back again seems wasteful and fragile to me.

Hi Branden,

This would be great, but I see some problems with the current code. Doing 
this:-

[derij@pip build (master)]$ echo ".device \[u012F]"|./test-groff -Tpdf -Z | 
grep "^x X"
x X \[u012F]
[derij@pip build (master)]$ echo "\X'\[u012F]'"|test-groff -Tpdf -Z | grep "^x 
X"
x X \[u0069_032]

Shows that the \[u012F] has been decomposed (wrongly!) by \X. Whilst this 
might make sense for the text stream since afmtodit keys the glyphs on the 
decomposed unicode. I would love to know why we decompose, since none of our 
fonts include combining diacritical mark glyphs so neither grops nor gropdf 
have a chance to synthesise the  glyphs from the constituent parts if it is 
not present in the font! Given that the purpose of \X is to pass meta-data to 
output drivers, which probably will convert it to utf-8 or utf16, it seems odd 
to decompose the output from preconv (utf16) before passing to the output 
driver, .device does not.

The correct decompose for 012F is 0069_0328, so it is just a string truncation 
bug.

Just like you I would like to avoid "round-tripping", utf16 (preconv) -> 
decomposed (troff) -> utf16 (gropdf). This does not currently affect grops 
which does not support anything beyond 8bit ascii. Do you agree it makes more 
sense for \X to pass \[u012F] rather than \[u0069_0328]?

Cheers 

Deri






reply via email to

[Prev in Thread] Current Thread [Next in Thread]