[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [groff] Mapping of \(bu to MIDDLE DOT
From: |
Jeff Conrad |
Subject: |
Re: [groff] Mapping of \(bu to MIDDLE DOT |
Date: |
Thu, 28 Mar 2019 11:25:52 +0000 |
On Thursday, March 28, 2019 3:01 AM, G. Branden Robinson wrote:
> At 2019-03-27T04:34:18+0000, Jeff Conrad wrote:
> > Is there a reason that tty.tmac translates \(bu to \(pc or \(md
> > regardless of the output device or whether \(bu is available?
> >
> > .ie c\[pc] \
> > . tr \[bu]\[pc]
> > .el \
> > . if c\[md] \
> > . tr \[bu]\[md]
>
> Are you looking at an old implementation? There's some important
> context missing here:
Yep-I'm using 1.22.3. Running Windows, I've had to diddle a few things,
so the upgrade isn't as simple as it could be.
> $ nl /usr/share/groff/1.22.4/tmac/tty.tmac | sed -n '14,21p'
> 14 .if !'\*[.T]'utf8' \{\
> 15 . ie c\[pc] \
> 16 . tr \[bu]\[pc]
> 17 . el \
> 18 . if c\[md] \
> 19 . tr \[bu]\[md]
> 20 .\}
> 21 .
>
> It sure seems like you might be re-reporting a problem Carsten Kunze
> raised in June 2015, and which prompted Werner to wrap the conditional
> you mention in an "if device is not UTF-8" block:
> https://lists.gnu.org/archive/html/groff/2015-06/msg00040.html
Again, yep-I used the wrong search query ...
> Really we shouldn't be conditional on UTF-8 per se, but on the existence
> of the bullet glyph in the font for the tty device.
Completely agree.
> However, the tty device ignores fonts ...
> these devices can report their character repertoire up to an
> application. VGA-style console devices, framebuffer consoles, and GUI
> terminal emulators can even change these on the fly. (Who else
> remembers live-hacking the display font in MS-DOS?)
We're obviously at the mercy of the chosen font (on Windows, I use
Lucida Console as the best of very limited options). But the device at
least gives us a reasonable idea of what's possible.
> So Werner's fix worked because there were (and are) no nroff/tty devices
> in the groff tree that supported the bullet character _except_ -Tutf8.
>
> My recommendations are:
> 1) Upgrade to groff 1.22.4; and
> 2) Change the conditional on line 14 of tty.tmac from:
>
> 14 .if !'\*[.T]'utf8' \{\
>
> to:
>
> 14 .if !c\[bu] \{\
>
> ...and tell us if that fixes your problem.
Making this change (which I've already done) indeed fixes things.
> Personally, I advocate incorporating cp1252 into groff. It's only an
> 8-bit character set, should therefore be a low maintenance burden, and
> really should make life a bit more bearable for groff's Windows users.
> And that's good PR for groff, GNU, copyleft, and Free Software.
It's yours for the asking; it's really just latin1 with the additional
characters that Microsoft added to the C1 area. I went a bit further
and added spelled-out representations of missing Greek characters (I
hate missing symbols; in the old, old days, I guess one would print the
document and write in the missing symbols. Yeah, right ...). But if
these additions aren't for everyone, they're easily deleted.
> > Even for Tlatin1, I'd prefer an asterisk or even the age-old
> > overstruck '+' and 'o'. Isn't the general rule for nroff to make the
> > best possible visual approximation when the true character isn't
> > available?
>
> As noted above, knowing what will actually show up on the output
> device is, in principle, impossible for nroff/tty output devices.
The user needs to pick the most appropriate font; there don't seem
to be all that many choices that we need to worry about.
> However, we can generally assume that users of 8-bit encodings will
> have comprehensive fonts available by default--they'd have to go out
> of their way to avoid them.
But 8-bit encodings (e.g., ISO 8859) have their limitations; in
particular, they're missing most of the common punctuation characters
used in typesetting. The MS extensions addressed most of this.
> Life is harder in UTF-8 world.
Yep. Especially on Windows. I had to hack the devutf8 font files to
use U+002D rather than U+2010 for a hyphen, because Lucida Console
doesn't include the latter. Ya do what ya gotta do ...
But Microsoft are working on it ...
https://devblogs.microsoft.com/commandline/windows-command-line-unicode-and-
utf-8-output-text-buffer/
Skip to "Are we there yet?" near the end if you're less than fascinated
with the topic.
> To get that asterisk:
>
> In your documents, or your .troffrc, could you not do this?
>
> .fchar \[bu] *
Yes. I've already done something similar. But this won't help with the
few files I generate for general distribution. For example, for GNU
units, we generate a man page from texinfo source with a perl script,
and obviously can't assume a customized .troffrc-so we include a few hacks
to override some groff settings (e.g., ".tr \(oq'"). We actually don't
even assume groff, so we try to cover all the bases; this probably is
overkill nowadays.
> As a minor point, I do think the existing fallback should be reversed in
> order:
>
> From:
>
> .fchar \[bu] \z+o
>
> To:
>
> .fchar \[bu] \zo+
Interesting how we differ on this. I don't like either alternative, but
find the 'o' more instantly recognizable-it's sorta kinda a circle. As
I recall, the AT&T version 2 nterm files that I had in the late 1980s
had it as you suggest, and I reversed it. I guess it's a matter of
personal preference. The asterisk avoids the problem.
> The \z+o status quo seems to follow a pattern that makes sense for
> modified letterforms, i.e., \z'a; on a 7-bit ASCII, non-overstriking
> device, you want the "a" to "win", because it carries the more important
> semantic information.
In general, I completely agree.
> That reasoning does not hold for bullet substitutes, which simply need
> to stand out graphically (your argument for not using a middle dot or
> centered period, which may be as small as one pixel on some devices),
> and not be semantically confusable with text.
In this circumstance, I don't know whether we can really separate
graphics and semantics.
> As "o" is actually a word (even in English, though much more prominently
> in Spanish), I find the present arrangement unfortunate.
I think it's largely a matter of context. As the tag for a list, I
think confusion would be unlikely. And again, an asterisk-perhaps ugly
but arguably the most common ASCII approximation of a bullet-would seem
to avoid the problem.
In my senior year of high school, I had an English teacher-a PhD-who
tried to drill into us that the "best" English is that which provides
the maximum communication (and it generally avoids pompous polysyllabic
pronouncements). I suggest something similar for the "best" groff. Of
course, it's not always easy to reach consensus on the details.
Regards,
Jeff