groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Novel use of .char


From: G. Branden Robinson
Subject: Re: Novel use of .char
Date: Tue, 17 Dec 2024 11:52:50 -0600

At 2024-12-16T16:35:22+0000, Deri wrote:
> On Sunday, 15 December 2024 23:54:08 GMT G. Branden Robinson wrote:
> > Crudely, it _looks_ like the baseline might be getting shifted
> > upward by the vertical dimension of the image, as if placement of
> > the image-based character "knew" it needed to compensate for the
> > implicit vertical motion caused by inlining it.  If so, I will be
> > relieved, because we can now regard that fixup as a kludge, and take
> > it out.  If I can find it.
> > 
> > But I could be wrong, and there's no policy around any of this, just
> > a bunch of things that happened to work together, never got specced,
> > and never got unit-tested.  :(
> 
> The string register "gnu" contains this:-
> 
> Move up : Output image from separate environment : Move right.

Agreed.  For those following along at home:

>>> .ds gnu \v'-\n[img-d]u'\[img]\h'\n[img-w]u'
            11111111111111122222233333333333333

> Comparing the difference in the grout from 1.23.0 and current, you can
> see two differences:-
> 
> @@ -73,10 +71,9 @@
>  tGNU
>  wh2500
>  thead
> -wh2500
> +wx X pdf: pdfpic EXPERIMENTS/GNU-head-small.pdf
> +wh30000
>  V86000
> -x X pdf: pdfpic EXPERIMENTS/GNU-head-small.pdf
> -wh27500
>  timage.
>  n12000 0
>  V132000
> 
> The space after the word "head" is dropped and two spaces are added
> after the image instead. Also the "move up" now occurs after the call
> to pdfpic.
> 
> I think this is related to your changes to \X and decisions of when to
> flush changes.

Yes.  Reverting the following commit fixes the problem.

commit f865d4ac91e2b07ee4becfd3f8313cf56ed57863
Author: G. Branden Robinson <g.branden.robinson@gmail.com>
Date:   Sun Sep 8 19:02:37 2024 -0500

    [troff]: Refactor.

    * src/roff/troff/node.cpp (class troff_output_file): Drop `tf` and
      `gcol` arguments from declaration, because...

      (troff_output_file::start_device_extension): Cease output of
      commands to update the font, stroke color, and drawing position.
      Drop now-unused `tf` and `gcol` arguments.

      (device_extension_node::tprint_start): Drop `tf` and `gcol`
      arguments from call site.

    * tmac/tests/an_MR-works.sh: Update test expectations.  The expected
      commands shift location by two lines, but the output still behaves as
      desired.

I'll admit that changing the test expectations prompted a twitch in my
brain back when I committed it.  But with no documentation, no relevant
source comments, no failing test case (in the sense of a visually
detectable difference in output), and nowhere to turn for advice, I
gambled.

I'm glad Peter raised; now that commit has to fold.  :)

And now maybe we can get that test case.

> The fundamental requirement for \X is that it remains in the
> chronology of the output stream. If there is a colour change before \X
> it needs to occur first, if there are word gaps before the \X, they
> should be output first. This is true for any action which changes the
> output, the chronological order of actions in a users document must be
> preserved in the grout.
> 
> Conversely .output and \! do not follow chronologically, they pass
> data to grout as soon as they are actioned, even if a partial line is
> being constructed by groff.

Your statements are a bit too prescriptive for me.  They are neither
documented anywhere, nor reasonably inferable from CSTR #54 or #97.
This is simply the way things happen(ed) to work as implemented.  I
would not bet any money at all that the assumptions you're stating here
can be expected to hold for Heirloom Doctools or neatroff, for instance.
(I feel no urgency to work up an exhibit with which to test my
suspicion, though.  So if someone wants to prove me wrong, here's a
golden opportunity. :) )

As the author of an output driver, and the only one for groff who still
participates in our mailing list discussions, your insight into these
matters is much deeper than almost any user's.

I'll push the reversion of the commit and add an automated test based on
Peter's reproducer if I can figure one out (fortunately pdfmom [and the
path search issues its use implies] is not required; "groff -T pdf"
suffices), but I am increasingly unhappy with the undocumented and
unmotivated[1] distinctions between device control requests and escape
sequences in this area.

Furthermore I think we must develop our terminological lexicon to
support intelligible discussion of these matters.  I find your
distinction between something that happens "chronologically" versus
something that happens "as soon as it is actioned" to be pretty slippery
to get a hold of in the mind.  Both imply temporal sequencing to me.

Maybe it's just _my_ mind that struggles...

As a rule, GNU troff output works like this:

0. Text and formatter directives construct node objects.
1. An output line consists of a sequence of node objects.
2. When it's time to write out an output line (producing "grout"), each
   node object is read and interpreted.

   a. Some nodes are "printable", and directly produce one or more grout
      commands.
   b. Some nodes are not "printable"; instead they alter internal state
      and don't directly produce grout commands, but can influence
      the printing of nodes processed subsequently.

Without having rigorously worked it out, I think that the distinction
you are making maps more or less to 2a and 2b.

--- BEGIN DIGRESSION ---

Let me use the new `pline` request I added for groff 1.24 to illustrate.

$ printf 'Hello.\n.br\nfoo\\%%bar\n.pline\n' | groff -Z >OUT 2>ERR

First, the grout:

$ cat OUT
x T ps
x res 72000 1 1
x init
p1
x font 5 TR
f5
s10000
V12000
H72000
md
DFd
tHello.
n12000 0
V24000
H72000
tfoobar
n12000 0
x trailer
V792000
x stop

Now, the node list corresponding to the "foobar" output line.

$ cat ERR
{type: line_start_node, diversion level: 0},
{type: glyph_node, character: "f", diversion level: 0},
{type: glyph_node, character: "o", diversion level: 0},
{type: dbreak_node, diversion level: 0, none: {type: glyph_node, character: 
"o", diversion level: 0}, pre: {type: glyph_node, character: "\hy", diversion 
level: 0}},
{type: glyph_node, character: "b", diversion level: 0},
{type: glyph_node, character: "a", diversion level: 0},
{type: glyph_node, character: "r", diversion level: 0}

Glyph nodes are examples of stuff that gets printed immediately.  More
or less.  Thanks to the "t" grout command (a GNU troff extension),
glyphs with all the same properties except for glyph index and which are
simply output with advance of the drawing position by the glyph width
(no breaks, no overstriking) and which aren't subject to track kerning
(we're really piling on the exceptions here) accumulate into a `tbuf`.
The "t" command seems to be to almost completely be a concession to
output readability, and I have sometimes wondered if we could use a knob
somewhere that would turn it off and caveman back to 'c' and 'h'
commands in strict alternation for the same output, as Kernighan's first
cut of device-independent troff must have done before having to
"optimize" the output.  See CSTR #97.

A "line start node" is a good example of something that doesn't produce
grout output.  Don't take my word for it; it has no `tprint` member
function.

https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/node.h?h=1.23.0#n155

Contrast:

https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/node.cpp?h=1.23.0#n1899

(Here the "t" seems to mean "troff"; nothing to do with the "t"
["text"?] GNU troff extension command.)  It's a good thing we still
build monuments to Ken Thompson when deciding how to name things.  Every
concept affords a one-letter abbreviation, two at most, and any
ambiguity that arises is purely a figment of the reader's sadly limited
intellect.  You suck at chess, too.

Tellingly, some nodes can "force" a "tprint", meaning: flush the buffer
of accumulated glyphs to be written with the "t" command.

https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/node.cpp?h=1.23.0#n1933

There, I think, is where one aspect of the "chronology" manifests.

We furthermore see that the hyphenation control escape sequence `\%` did
an even more sophisticated thing.  It caused creation of a `dbreak_node`
("d" for "discretionary").

The dbreak node type also illustrates that nodes can contain other
nodes.  Pop quiz: Given a tree of nodes, do the use pre-order, in-order,
or post-order traversal?  I'm sure the answer is obvious to some (or
that they will claim it is).  It isn't to me.  I think it should be
documented.

Let's zoom in on the structure of the dbreak node.

{type: dbreak_node, diversion level: 0, none: {type: glyph_node, character: 
"o", diversion level: 0}, pre: {type: glyph_node, character: "\hy", diversion 
level: 0}},

It's not altogether obvious in my opinion, but what this means is: we
can break after this "o" glyph if we must, but if we do, before that
break ("pre") we must write a "\hy" glyph, and I trust that anyone who's
held on with me for this long suspects that this is an internal
representation of the same thing represented by `\[hy]` in the input
language: a hyphen.  You can have a "post" break glyph, too, but I don't
know of any cases where that is exercised.  The curious can consult
Appendix H of _The TeXbook_ for this.  Try not to be intimidated by the
_doubled_ use of the "dangerous bend" icons adjacent to many of the
paragraphs.  ("A special sign is used to designate material that is for
wizards only:")  Apparently this stuff is for double wizards.  :-|

Over time I hope to enhance this feature to make nodes disclose more
information about themselves.  As I have said before, if an expert user
has to launch GDB to figure out why their document is formatting the way
it is, we have failed them.  There must be a way to force the formatter
to divulge its secrets.

--- END DIGRESSION ---

This is dark water.  I don't know that anyone has ever written down
anything this detailed anywhere about GNU troff's output production
before, and I've looked.  (If I'm wrong, I eagerly welcome citations
thereto.)  The "Gtroff Internals" node of our Texinfo manual passes over
the topic MUCH more breezily.

Getting some documentation into place for these matters would be a good
objective for groff 1.25.  I hope you will continue to use your unique
expertise to throw light into these shadowy crevices.

Regards,
Branden

[1] "Unmotivated" meaning "we don't even have source code comments in
    relevant places shedding light on why certain decisions were made".

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]