groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Proposed: QS/QE macros for quotation in man(7)


From: G. Branden Robinson
Subject: Proposed: QS/QE macros for quotation in man(7)
Date: Mon, 16 Dec 2024 08:28:21 -0600

Hi Dave & Alex,

I guess I'd better firm this up a bit.

Synopsis:

.QS
       Begin quotation.  An opening quotation mark is formatted.  The
       line is not broken.

.QE
       End quotation.  A closing quotation mark is formatted.  The line
       is not broken.

At 2024-12-15T23:30:27-0600, Dave Kemper wrote:
> On Sat, Dec 14, 2024 at 12:02 PM G. Branden Robinson
> <g.branden.robinson@gmail.com> wrote:
> >     This is likely due for a cleaned up re-proposal under the new
> >     names `QS`/`QE` as suggested by Doug McIlroy.
> 
> Man pages using these proposed macros--which, since the macros don't
> exist yet, will be man pages edited in 2025 or later--will surely
> never be formatted by a roff that's limited to two-character
> identifiers.  Do other man-parsing tools expect all macros to be two
> characters or less?

I don't feel I have sufficiently broad knowledge of what man-parsing
tools are out there besides *roffs and mandoc(1).  I _think_, based on
parallel behavior, that Michael Kerrisk uses lexgrog(1) from man-db to
extract the summary-description (the body text of the "Name" section)
from pages.  Thomas Dickey maintains one of the several tools named
"man2html" that has existed over the years.[1]

Plan 9 troff is out there, retains the old AT&T/DWB troff limitation of
two-character names, and in fact implemented the newest groff man(7)
macro, `MR`, before we did.[2]  And they are limited to two characters.
That said, I have suspicions that Plan 9 from User Space users don't use
its troff to render non-Plan-9 man pages.

> Or, as the man language is selectively expanded, can its new macros be
> given human-meaningful names?

If I were starting a new man macro package from scratch, I would
certainly do this.  Since I'm not, I find it difficult to locate much
value in a macro language that's only _partly_ human-readable.  On top
of the existing crypticness, we'd be adding inconsistency.

At 2024-12-16T11:19:32+0100, Alejandro Colomar wrote:
> Hi Branden, Dave,
> 
> On Sun, Dec 15, 2024 at 11:30:27PM -0600, Dave Kemper wrote:
> > On Sat, Dec 14, 2024 at 12:02 PM G. Branden Robinson
> > <g.branden.robinson@gmail.com> wrote:
> > >     This is likely due for a cleaned up re-proposal under the new
> > >     names `QS`/`QE` as suggested by Doug McIlroy.
> 
> I still don't know what to expect of those macros.  Could you please
> send some examples of what you have in mind?

Here are some examples of where bash(1), as of this year, uses its new
page-local `Q` macro.

----
.TP
.B \-\-dump\-po\-strings
Equivalent to \fB\-D\fP, but the output is in the GNU \fIgettext\fP
.Q po
(portable object) file format.
----
When the shell is in posix mode, it does not recognize
\fBtime\fP as a reserved word if the next token begins with a
.Q \- .
----
The element with index 0 is the name of any currently-executing
shell function.
The bottom-most element (the one with the highest index) is
.Q main .
----
Any numeric argument given to a \fBreadline\fP command that was defined using
.Q "bind \-x"
(see
.SM
.B "SHELL BUILTIN COMMANDS"
below)
when it was invoked.
----

Here's how I'd write these with QS/QE (ignoring other style preferences
of mine).

----
.TP
.B \-\-dump\-po\-strings
Equivalent to \fB\-D\fP, but the output is in the GNU \fIgettext\fP
.QS
po
.QE
(portable object) file format.
----
When the shell is in posix mode, it does not recognize
\fBtime\fP as a reserved word if the next token begins with a
.QS
\-\c
.QE
\&.
----
The element with index 0 is the name of any currently-executing
shell function.
The bottom-most element (the one with the highest index) is
.QS
main\c
.QE
\&.
----
Any numeric argument given to a \fBreadline\fP command that was defined using
.QS
bind \-x
.QE
(see
.SM
.B "SHELL BUILTIN COMMANDS"
below)
when it was invoked.
----

I _can_ foresee some objections.

1.  "Oh no!  I have to learn how to use `\c`!"

    Yeah.  Macros like `BR` and `IR` are able to conceal the necessity
    of that escape sequence from the man page author because they format
    their arguments.  Under the hood, `BR` for example could be
    implemented (rudimentarily) like this.

    .de BR
    .nr Of \n(.f \" "old font"
    .ft B
    $1\c
    .ft R
    $2
    .ft \n(Of
    ..

    QS/QE don't format their arguments because they don't take any.
    That in turn is because *roffs ignore macros they don't recognize,
    and so the text of their arguments is lost--it doesn't format.  If
    we want to avoid doing violence to man pages that get formatted on
    old systems or with old formatters that don't know QS/QE, using `\c`
    more often is part of the price.

    Increasing the use of `\c` also increases the pressure on me to go
    do something about po4a.[3]

2.  "Oh no!  I have to remember to use `\&` before `.` at the start of
    an input line!"

    You _already_ have to remember that.  But QS/QE might stimulate more
    occasions for recollecting it.

To address some of the use cases in bash(1), as I mentioned earlier, it
is necessary[4] for `QS` to support a Boolean argument to suppress
hyphenation of the first word in the quotation.

bash(1) today:

.TP 8
.B ignoreeof
The effect is as if the shell command
.QN "IGNOREEOF=10"
had been executed
(see
.B "Shell Variables"
.ie \n(zZ=1 in \fIbash\fP(1)).
.el above).

Under my proposal:

.TP 8
.B ignoreeof
The effect is as if the shell command
.QS 1
IGNOREEOF=10
.QE
had been executed
(see
.B "Shell Variables"
.ie \n(zZ=1 in \fIbash\fP(1)).
.el above).

> I think for consistency sticking to the short format is a good thing,
> unless at some point we find we need longer ones (but that would be a
> good point for saying we have enough macros in man(7) that the
> language is too fat).

I agree.  mdoc(7) illustrates the distance one can carry a two-letter
macro lexicon--a perhaps inadvisably long way.

Possible further enhancements
=============================

A.  Have `QS` accept second and third arguments specifying the quotation
    characters to use.  This is like mdoc(7)'s `Eo`/`Ec`, and would make
    QS and QE more general inline enclosure/bracketing macros.

    Press
    .QS 1 < >
    Enter
    .QE
    to continue.

    I'm leaning away from this, though.  (1) It doesn't seem quite in
    keeping with man(7)'s philosophy in a way I struggle to articulate
    (maybe Doug can help).  (2) It trades away the advantage of not
    losing text (apart from the quotation marks themselves, which
    historically man pages authors don't bother to put in in their
    literal forms in the first place because they aren't sure how).
    Here, the characters that get dropped are not quotation marks per
    se, but '<' and '>', which people have been able to type and get
    formatted without trouble forever, unlike “ ” ‘ ’.  (In indulgence
    of "power users'" petulant rebellion against good typography,
    distributors hack up "man.local" to make it harder still to get ‘
    and ’.[5])  And (3) people might get carried away.

    .SY tbl
    .QS 0 [ ]
    .B \-C
    .QE
    .QS 0 [ ]
    .I file
    \&.\|.\|.
    .YS

    I don't think that's an improvement on the status quo.

    .SY tbl
    .RB [ \-C ]
    .RI [ file\~ .\|.\|.]
    .YS

B1. Alternate double and single quotation marks with the parity of the
    nesting level.

    .QS
    I mean,
    if 10 years from now,
    when you are doing something quick and dirty,
    you suddenly visualize that I am looking over your shoulders and say
    to yourself
    .QS
    Dijkstra would not have liked this\c
    .QE
    ,
    well,
    that would be enough immortality for me.
    .QE

    Rendering on a typesetter or UTF-8 terminal:

    “I mean, if 10 years from now, when you are doing something quick
    and dirty, you suddenly visualize that I am looking over your
    shoulders and say to yourself ‘Dijkstra would not have liked this’,
    well, that would be enough immortality for me.”

    Rendering on an ASCII or Latin-1 terminal:

    "I mean, if 10 years from now, when you are doing something quick
    and dirty, you suddenly visualize that I am looking over your
    shoulders and say to yourself 'Dijkstra would not have liked this',
    well, that would be enough immortality for me."

B2. It would be trivial to support the British, who use the wrong
    quotation marks^W^W^W^W^Wdrive on the wrong side of the
    road^W^W^W^W^W^W^W^Whave a different quotation mark convention.  A
    documented rendering configuration register, akin to `LL`, `IN`, and
    `PO`, could invert the sense of nesting parity.  I imagine this
    would be another matter handled in "man.local".  In fact, since in
    groff we have `\V`, it could even be made sensitive to the locale
    settings of the process environment.  I don't think I'd bother in
    the stock configuration (and because I'm not an expert on which
    territories prefer the "other" quotation convention as strongly as
    Fleet Street does).  Distributions are more likely to have the
    relevant expertise.

    But this, too, may not be worth messing with, as even if the
    quotation marks are correct according to one's training/biases,
    everybody has to read nonstandard English spellings in man pages
    written on the other side of the ocean anyway, and that's probably
    more jarring.  Yet after 45 years of man(7), there's been little
    user demand expression for man pages to offer bifurcated en_GB and
    en_US localized versions.  I've never seen anyone throw a fit about
    this, and I've seen the Phoronix forums.

    I really could go either way on this aspect of the macros.

Regards,
Branden

[1] https://invisible-island.net/scripts/man2html.html

[2] They initially called it `IM` and changed it to `MR` at my request.

    https://github.com/9fans/plan9port/commits/master/tmac

[3] I distracted myself with a vision of a grand solution that would
    solve a whole bunch of problems at once.  But that will take time to
    design (let alone deploy).  A spot fix for po4a's non-interpretation
    of `\c` is important, too.

    https://github.com/mquinson/po4a/issues/527

[4] Not _strictly_ necessary, but I am loath to encourage man page
    authors to experiment with `nh` and `hy` requests; that can only end
    in tears.

[5] "With Debian (and other distributors...) capitulating to pressure to
    override the meanings of these input characters once again, a cost
    is imposed on correctly composed pages that historically rendered
    well: whereas `foo' formerly reliably appeared as ‘foo’ everywhere
    directional single quotes were supported (and as 'foo' where they
    were not), now `foo' appears as `foo', making the page ugly and
    wrong.  (I know of no UTF-8 font for a terminal emulator that
    renders these glyphs as the ASCII standard ANSI X3.4-1968 depicts
    them; see <https://ia800800.us.archive.org/35/items/\
    enf-ascii-1968-1970/Image070917151315.pdf>.  The inconsistency of
    the unlettered, selectively ASCII-championing revanchist stance is
    nearly as frustrating as its ignorance.)"

    
https://gitlab.com/procps-ng/procps/-/merge_requests/213/diffs?commit_id=a3ac4b667929320d4c8012435d63a9d1dd538a8d

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]