[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: CHECKSTYLE suggestions: unnecessary quotations and unnecessary \f es
From: |
G. Branden Robinson |
Subject: |
Re: CHECKSTYLE suggestions: unnecessary quotations and unnecessary \f escape |
Date: |
Mon, 21 Mar 2022 01:07:11 +1100 |
User-agent: |
NeoMutt/20180716 |
Hi, Alex!
At 2022-03-19T17:07:09+0100, Alejandro Colomar (man-pages) wrote:
> While fixing style issues in the man-pages project,
> I'm finding a few recurrent issues that I think you could warn about:
>
> Unnecessary quotations:
>
> [
> .I "foo bar"
> .IR foo "bar"
> ]
That is going to be hard to detect from within a macro package. As
noted in our recent discussion of quotation marks in macro calls, by the
time these arguments get to the `I` and `IR` macros, those macros have
no way of knowing of they were excessively quoted in the calling
context.
I don't have a solution for this problem. To solve it would require
modifying GNU troff's input parser to track some kind of "extraneous
quote" state. Since as we saw in our earlier discussion, a sequence of
up to four double quotes can be perfectly valid, my intuition is that
this problem is worse than regex-hard, and the cost might rapidly
outweigh the benefit.
If you need this, it's probably better to just write a regex-based tool
that scans the man page source. You can then enforce a stricter
discipline, permitting false positives on valid but unusual constructs
that would be better recast.
> Unnecessary escape \f:
>
> [
> foo \fIbar\fP baz
> ]
>
> The last one is more difficult to decide when it's unnecessary, but
> you could maybe start with non-formatted lines.
This is also a big challenge, and on my first reflection, even worse, as
you suspect. The problem is that what you quote is an ordinary text
line, and *roffs don't generally look very far ahead when parsing.
There aren't many ways in the language to peek ahead in the input
stream.
The only ways I can think of would be to set up the macro package such
that all text lines get captured into a macro or diversion. You might
then be able to iterate through the stored content somehow--though I
don't know off the top of my head a way to do this line by line. I also
don't know how to do something like save some kind of pending input line
into a string for processing with the few simple requests we have for
that. There's also the problem of interpreting that input well enough
to recognize undesirable constructs--do you want to write a troff in
troff?
Again I would attack this with a less perfect but much more tractable
regex-based input scanner. I would filter out tbl(1) regions and then
flag _any_ font selection escape sequence that isn't on a control line,
meaning a line starting with '.' (that's an over-crudification[1], but I
predict that it will work well for most pages. I'm attaching a shell
script I've come up with do this. For groff's own pages, it mostly
turns up use of non-man(7)-standard fonts (not roman, bold, or italic)
and some pages I haven't yet done a thorough revision on.
Regards,
Branden
[1] no-break control character, line continuation, yadda yadda yadda
find-font-escapes.sh
Description: Bourne shell script
signature.asc
Description: PGP signature