Re: ripgrep author seems happy with groff_man

groff

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ripgrep author seems happy with groff_man_style(7)

From:	Ingo Schwarze
Subject:	Re: ripgrep author seems happy with groff_man_style(7)
Date:	Wed, 22 Jan 2025 22:04:59 +0100

Hi Branden,

G. Branden Robinson wrote on Mon, Jan 20, 2025 at 11:59:41PM -0600:

> We need a design for automatic construction of
> tag/anchor names from the user-specified names of the items to be
> tagged.  In man(7) documents, those taggable items are probably going to
> be:
> 
> 1.  the identifier of the page itself, with "section" number;
> 2.  section heading text;
> 3.  subsection heading text; and
> 4.  the tag text of tagged paragraphs (`TP`).

In addition to those, mandoc(1) also tags the tag text of .IP and .TQ.
But for all these, it does some sanitation before creating a tag,
see this comment in the file man_validate.c:

/*
 * Skip leading whitespace, dashes, backslashes, and font escapes,
 * then create a tag if the first following byte is a letter.
 * Priority is high unless whitespace is present.
 */

The "letter" condition is needed because .IP and .TP are also used
for bullet and numbered lists, and in those cases, the tag is often
something like "-", "\(en", "*", "\(bu", "1.", "2)" etc. which
we clearly don't want to tag.

> A.  Generation of _unique_ hyperlink tags from #2-#4 above.

Don't, just don't, for the reasons explained in my other mail.

[...]
> C.  We then need a way to make references to these anchors/tags.

Please do not rush like that.  As explained in my other mail, that's 
actually only useful in surprisingly rare cases.  Better first
get the simpler case of local jumping well-designed and stable
before progressing to the much, much harder next stage of non-local
jumping.

>     For man(7) the `MR` macro new to groff 1.23 was an obvious site
>     to add the appropriate machinery for document-level links.
>     mdoc(7)'s `Xr` is closely analogous and has existed for many years.

Yes, both have almost identical semantics and are a likely candidate
for extension, if we come to the conclusion an extension is needed.
I didn't consider the details yet, though.

>     i.  No way to hyperlink in a more fine-grained way, that is to
>         (sub)section headings or, conceivably, to paragraph tags.  This
>         is a tougher problem because if these are not unique within a
>         page, the location making the link has to know about the
>         structure of the document.  Possibly, we'll just punt on the
>         issue of "deep" cross-document links.

Punt for now, yes; maybe we can find a good solution later,
when the easier parts are done.

One possible solution is to just ask authors to engage their brain
before deep linking.  It should be fairly obvious that deep linking
to the tag "h" in the ksh(1) manual page is a stupid idea.  Even if
there weren't three instances of that tag already (for three
completely different features), everybody will expect that more
such instances can easily pop up at any time and make your shiny
new link point into the woods.

On the other hand, linking to the tag "CIPHER_LISTING" or the
tag "EVP_get_cipherbyname" in EVP_EncryptInit(3) is almost
certainly fine because it's hard to image a scenario where
those tags might become ambiguuos in the future, see

  https://man.openbsd.org/EVP_EncryptInit.3
  https://man.openbsd.org/EVP_EncryptInit.3#CIPHER_LISTING
  https://man.openbsd.org/EVP_EncryptInit.3#EVP_get_cipherbyname

In any case, it's important that the tag names exactly match
the actual syntax elements, such that users can type them
without any prior knowledge.  Invented or constructed tag names
are next to useless.

I'm not sure you have exhaustively analyzed cross-document linking,
mostly because i definitely haven't analyzed cross-document linking
in manual pages exhaustively myself.

But i'm aware of at least two aspects you maybe missed:

 1. While you discussed tag generation (incompletely),
    tag format (incompletely in IMHO in part misguided)
    and link display, the purpose of a link is being followed.
    As the first step, the requires the user to select the
    link.  For HTML output, it is obvious how that works:
    in a graphical browser, click the link with the mouse or
    navigate to it with the keyboard (the latter probably
    being the method of choice if you are using a screen
    reader - though i'm not sure because i'm not blind and
    have not talked that much to blind users).  In a text
    browser navigate with the keyboard.
    How is the user supposed to select a link in less(1)?
    That looks like a problem requiring considerable design
    and implementation effort even if you are a less(1) hacker.

 2. The purpose of selescting a link is displaying the target.
    For HTML output, it's obvious how that works because that's
    what hypertext was designed for in the first place:
    when a link is selected, close the current document and
    open the target one, or optionally open the target in a
    new tab or window if the browser and/or window manager
    support that and the user wants it.
    For mandoc(1), implementing a selection mechanism would
    actually not be all that difficult.  When the user selects
    a link, mandoc can simply close the current file, look
    up the desired target in the mandoc.db(5) database to
    retrieve the file system path to the desired manual page
    source file in the file system, open that file, parse it,
    generate a new tags file from it, format it, and spawn
    a new pager process passing the file names of both
    temporary files.  Really not rocket science at all.
    But groff does know about mandoc.db(5), so even when it
    knows that it is looking for the "h" tag in the ksh(1)
    manual page, it will have a very hard time figuring out
    where in the file system to look for the file "ksh.1"
    (if that is even the name in the file system!).  Once
    it has the file, it can maybe do the parsing and formatting
    to produce the two files, though i'm not sure because so
    far, i don't think it has infrastructure to manage
    temporyry files for such purposes.  And then what?
    Spawn less(1)?  At least so far, groff(1) never does that.
    If all this were solved, wouldn't that make the man-db
    package obsolete?  Do you really feel that close to
    obsoleting man-db, or incorporating it into groff?

 3. Then there is the following particularly interesting
    special case.  The mandoc implementation of man(1)
    already supports the command

      $ man EVP_get_cipherbyname

    even though there is no file EVP_get_cipherbyname.3 anywhere
    in the filesystem.  It opens the manual page EVP_EncryptInit(§)
    at the top, which documents EVP_get_cipherbyname further down.
    Traditional man(1) implementations like BSD man, Eaton man,
    man-1.5, man-1.6, and man-db support essentially the same with
    symbolic or hard links or one-line files containing .so requests
    on the file system level.

    In mandoc, it would be trivial to make
    "man EVP_get_cipherbyname" jump straight to the location of
    https://man.openbsd.org/EVP_EncryptInit.3#EVP_get_cipherbyname
    even in terminal output.  Is that desirable?  Likely not.
    Does that mean "EVP_get_cipherbyname" is a tag like any
    other even in the page also know as EVP_get_cipherbyname(3)?
    Likely neither.  For example, it might be useful for less
    to assume in that case that the user typed

      /EVP_get_cipherbyname<ENTER>g

    To search for the target function name such that it gets highlighted
    in the text, then return to the top with the less(1) 'g' command.
    I didn't really think about that yet.  It seems like that
    will also need careful consideration and design.  How does that
    (still completely unexplored) picture change with deep linking?

 4. I almost certainly did not find all design gaps.

So i suspect before this can become useful in practice, there
is still some very serious design work that needs to be done.

Yours,
  Ingo

[Prev in Thread]

Current Thread

[Next in Thread]

Re: ripgrep author seems happy with groff_man_style(7), (continued)
- Fwd: Re: ripgrep author seems happy with groff_man_style(7), onf, 2025/01/20
- Re: ripgrep author seems happy with groff_man_style(7), T. Kurt Bond, 2025/01/20
  - Re: ripgrep author seems happy with groff_man_style(7), Ingo Schwarze, 2025/01/20
    - Re: ripgrep author seems happy with groff_man_style(7), T. Kurt Bond, 2025/01/20
    - Re: ripgrep author seems happy with groff_man_style(7), Ingo Schwarze, 2025/01/24
    - Re: ripgrep author seems happy with groff_man_style(7), onf, 2025/01/20
    - Re: ripgrep author seems happy with groff_man_style(7), Ingo Schwarze, 2025/01/22
    - Re: ripgrep author seems happy with groff_man_style(7), onf, 2025/01/22
    - Re: ripgrep author seems happy with groff_man_style(7), Tadziu Hoffmann, 2025/01/20
    - Re: ripgrep author seems happy with groff_man_style(7), Ingo Schwarze, 2025/01/20

Prev by Date: Re: ripgrep author seems happy with groff_man_style(7)
Next by Date: Re: ripgrep author seems happy with groff_man_style(7)
Previous by thread: Re: Putting UTF-8 in grout (was: ripgrep author seems happy with groff_man_style(7))
Next by thread: Fwd: Re: ripgrep author seems happy with groff_man_style(7)
Index(es):
- Date
- Thread