groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [groff] [patch] modernize -T ascii rendering of opening single quote


From: G. Branden Robinson
Subject: Re: [groff] [patch] modernize -T ascii rendering of opening single quote
Date: Thu, 21 Feb 2019 22:10:15 +1100
User-agent: NeoMutt/20180716

At 2019-02-21T11:18:23+0100, Ingo Schwarze wrote:
> No problem, this wasn't an urgent matter, but instead merely an
> attempt at long-term cleanup.
[...]
> Thanks for describing your view on the matter.  It indeed clarifies
> the situation because it effectively adds you to the OPPOSED line,
> reducing the chance that someone might regard the situation as an
> overwhelming majority.

My opposition is not strong; I simply think that ASCII--true ASCII--the
seven-bit code with no printing characters at 0-0x20 and 7F--was
ambiguous about the 0x27 and 0x60 codepoints in the first place.

Regarding Latin-1, I think it's of greater relevance than we may at
first suspect...I'll quote you but come back to the subject after
pulling in Jeff and John.

> I consider the latin1 output device utterly obsolete and don't think
> it will be many years before it can be deleted outright.  Of course
> there is no rush to delete it, it doesn't eat any bread.
> 
> Being able to process LATIN-1 input is still relevant because various
> legacy documents exist (including substantial numbers of manual
> pages) that are encoded in LATIN-1.  But producing LATIN-1 *output*?!
> Gimme a break!
> 
> The only reason i included the latin1 device in the patch was to
> keep it consistent with the ASCII device as long as it is still in
> the tree - but if you want to maintain it independently, fine with
> me, i certainly don't object to that, do whatever you want with
> it...

At 2019-02-21T10:01:07+0000, Jeff Conrad wrote:
[Re: `quotes like this']
> I never did this with a mechanical (or electric) typewriter ...

Me neither.  The mechanical typewriters I was exposed to as a youngster,
whether manual[1] or IBM Selectrics[2], consistently lacked a grave
accent key as far as I can remember.  Many computer keyboards in the
8-bit era also did not provide enough key mappings to access the full
range of printable ASCII codes, either, and the grave accent keycap in
particular was a scarce thing.

> The Chicago Manual of Style, 13th ed., says “a single quotation mark,
> however, should not be used to indicate an accent, because it could be
> either a grave or an acute accent” (2.14, p. 42)—so I guess it assumed
> typewriter single quotes aren’t symmetrical.

Not paired, I think you mean.  A vertical apostrophe would certainly be
perfectly ambiguous with respect to a grave or acute accent.

> We still have the question about what is (or was) a Genuine ASCII™
> device, since different manufacturers had different implementations.  I
> must admit that, after consulting an HP 2392 manual, I see that it used
> the HP Roman 8 character set, so I probably had a device that displayed
> symmetrical left and right quotes until at least the late 1980s.

If we actually have a userbase for ASCII output devices, I think we have
only 2 choices:

1. Provide 2 different "devascii"s to suit the different conventions; or
2. Give devascii a man page and tell people how to hack up their dot
   files to override our default.

Maybe devascii should have a man page anyway, to explain this annoying
issue.

> This leaves the impression that the devices Knuth and Kernighan and
> their associates rendered symmetrical left and right quotes.

I don't know about Knuth but it pays to keep in mind that, reportedly,
Bell Labs never had glass TTYs.  They used paper teletypes for years and
years and then leapfrogged straight to graphical terminals with the
Blit.

> Looking at The Unix Programming Environment (1984), Chapter 9,
> Document Preparation, it’s also apparent that double quotes were
> literal renderings of “``word''” (‘‘word’’), printed on a Mergenthaler
> Linotron 202.  The same impression obtains from The C Programming
> Language (1978), Preface; this was printed on a Graphic Systems
> (C/A/T?) phototypesetter.

Yes, and it is a shame that the number of people _reading_ these books
and applying them to practice on glass screens was probably multiple
orders of magnitude larger than those--including the authors--who did
daily work on noisy, grindy, smelly teletypes.

> But is this literal or just markup?  For the AT&T folks, perhaps the
> former.  With TeX, “``word''” gets mapped to “word”.  All I can say is
> that it’s a lot easier to type (and read) “``word''” than
> “\(lqword\(rq”.  I wish roff did the same thing (I still often use the
> “traditional” markup and do the conversion with a front-end script).

I admit that the ergonomics of the character escapes are poor.  When I
do my man page work I slog through it.

> > 2.  I _would_ change the latin1 device because there is no rational
> > defense of 0x60 ` as an opening quote in Latin-1 (a.k.a. ISO 8859-1).
> > In Latin-1 (and I think all the other ISO Latin alphabets, including
> > ISO 8859-15), 0x60 is the GRAVE ACCENT and is never[4] mirror-symmetric
> > with 0x27 ' APOSTROPHE but is instead mirror-symmetric with 0xB4 ´ ACUTE
> > ACCENT.
> 
> Hard to disagree with this; as it stands, it seems like it’s always been
> wrong, because I don’t think ISO 8859-1 or ECMA 6 ever provided options
> ECMA 6 did speak of using APOSTOPHE

Sic?  Did they really misspell it?  If so it's definitely going into my
arsenal of mockery on this topic.

> > On truly ASCII output devices, 0x60 GRAVE ACCENT _might_ be a
> > directional single quotation mark that pairs with 0x27; it therefore
> > makes sense to map \[oq] to 0x60.
> 
> But the operative word is _might_, so you’ve got to ask yourself one
> question: Do I feel lucky?

That's right.  No one will know except the owner of the device that is
purportedly limited to ASCII.

> More practically, how many users are likely to get crummy output on a
> Genuine ASCII™ device if the change is made versus how many are
> getting crummy output with things as they stand?  No question that
> devutf8 is the preferred way to go, but it’s not always an option for
> everyone.

No, but 8-bit encodings are what is actually used on everything with a
glass screen that _isn't_ Unicode-capable.

That's why I find the ASCII-puritanical position disingenuous (or
perhaps simply ignorant); first, ASCII was sadly semantically unchaste
in the first place when it came to 0x27 and 0x60,, and secondly, the
people complaining about the proposed change almost assuredly _aren't
using ASCII devices_.  They're using something considerably more
powerful, and are viewing characters like 'á" right now without
difficulty.

My apologies to anyone who has to occasionally change the ribbon on his
display device.

> > On a Latin-1, Windows-1252, or Unicode device, the foregoing WILL NOT
> > be true.  Latin-1 lacks directional quotation marks altogether, and the
> > other two encodings have dedicated code points for them, respecting 0x60
> > in its sole role as a spacing grave accent diacritic.  On none of these
> > will 0x27 APOSTROPHE be a copy of 0xB4 ACUTE ACCENT.
> 
> This raises another question: why not a devcp1252?  Many browsers treat
> it as a de facto superset of ISO 8859-1, but capriciously adding
> characters from the C1 area to devlatin1 is probably a bad idea.

Apart from the development effort, there's no reason not to, especially
if the Windows console these days actually supports it.  If it doesn't,
then supporting code page 437 in addition might make sense.

It might be worth it to do this to disabuse people of the utterly false
notion that their display device is "ASCII".  Again, if they're staring
at glass while reading this the very strong likelihood is that they 'kin
well are not.

Even people on DEC VT220 and later dumb terminals aren't; by then DEC
had National Replacement Character Sets, which were 8-bit codes.  (So if
you're on a VT100/102, let's hear from you.  What do _your_ grave
accents and apostrophes look like?)

> In all the excitement here, I created such a device and it works fine.
> I run a Windows environment that has some issues with UTF-8, and this
> allows reasonable rendering of most characters,

Neat!

I have little love for Microsoft but I find Windows-1252 to be a useful
test for minimal acceptable glyph coverage in a font, and as you point
out [snipped] even then it can be considered deficient even for
elementary mathematical typography.

At 2019-02-21T21:22:42+1100, John Gardner wrote:
> If my opinion matters in this discussion, then I'm tentatively opposed to
> this change. Reasons:
> — Folks limited to ASCII environments may be using a screen font with
> more suitable-looking quotes (e.g., Gallant)

I think that is a spurious reason.  Few people are actually limited to
ASCII environments.  They're calling whatever 8-bit legacy character
encoding they're using "ASCII", a sin with a long and sordid history.

Ingo marked me as opposed but I'm pretty close to backing the change
anyway just so we can smoke these people out.

> — Regression tests that assert man(1) output will break on systems
> with a modern groff(1) installed

As Ingo noted, man pages have many problems, and sometimes the corpus
reveals problems because the man pages were generated by tools written
by people who don't really understand *roff, let alone the subtleties of
character encoding standards.

Debian's already been doing the following for years:

/etc/groff/man.local:
.if n \{\
.  \" Debian: Map \(oq to ' rather than ` in nroff mode for devices other
.  \" than utf8.
.  if !'\*[.T]'utf8' \
.    tr \[oq]'

> — We're applying polish to a +50-year old documentation system whose
> biggest feature is *remaining unchanged.*

I disagree with that statement too.  Remaining unchanged of itself is
_not_ a virtue.

Being able to faithfully reconstruct documents of historical value,
however one chooses to interpret that, _is_ important.

We _can_ change things if we find compelling reasons to do so.  I would
stipulate only that we give archivists enough information so that they
can do their jobs.

https://en.wikipedia.org/wiki/Digital_dark_age

Regards,
Branden

[1] See attachment 1.
[2] See attachment 2.

Attachment: manual_typewriter.jpeg
Description: JPEG image

Attachment: ibm_selectric.jpeg
Description: JPEG image

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]