[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2 3/4] zic.8: Use correct escape sequences instead of specia
From: |
G. Branden Robinson |
Subject: |
Re: [PATCH v2 3/4] zic.8: Use correct escape sequences instead of special characters |
Date: |
Sat, 26 Nov 2022 15:19:47 -0600 |
Hi Paul,
At 2022-11-25T18:31:02-0800, Paul Eggert wrote:
> On 2022-11-23 10:43, Paul Eggert wrote:
> > I installed that
> Further testing showed that the installed patch doesn't work with
> traditional troff, which doesn't support groff escape sequences like
> \(aq.
I think this patch goes too far in the retrograde direction.
\(xx, where xx is any two characters, is not a groff extension. It
comes from Ossanna troff all the way back in the mid-1970s.
It is a special character escape sequence; a groff way of spelling it
is \[xxx] where xxx can be of any nonzero length (but cannot contain a
closing square bracket).
The repertoire of supported special character identifiers varies by
implementation and, after Kernighan's rewrite of troff circa 1980 for
device-independence, by output device. Nevertheless, for
portability/backward compatibility, a set of them are very widely
supported. These include three that your patch takes out, \(ha, \(ga,
and \(ti. Replacing these with ASCII characters will _not_ produce
correct typography on typesetting output devices.
I would attach scans of Tables I and II from "NROFF/TROFF User's
Manual", the version dated 1976, published with Volume 2 of the Unix
Programmer's Manual (1979), and reprinted by Holt, Reinhart, and Winston
in 1983, but the linux-man list rejects all attachments bigger than a
breadbox, so I will ask for your trust (or ask me for it privately).
Those tables illustrate the glyph repertoire of Ossanna troff and the
special character identifiers that were implemented.
groff_char(7) from groff 1.22.4 and earlier marks the special character
identifiers you can expect to be portable (with "***" in its listings),
and for 1.23 I have added a "History" section to the page which
addresses most of the thousand questions I've asked over the past few
years while trying to learn this stuff. I'll put that in a footnote.[1]
> To fix this I installed the equivalent of the attached further patch to
> TZDB.
I therefore propose the following snippet instead, also taking into
account Solaris 10 troff's poor handling of unsupported font selections
in nroff.
.q + .
To allow for future extensions,
an unquoted name should not contain characters from the set
.ie \n(.g .q \f(CR!$%&\(aq()*,/:;<=>?@[\e]\(ha\(ga{|}\(ti\fP .
.el .ie t .q \f(CW!$%&'()*,/:;<=>?@[\e]\(ha\(ga{|}\(ti\fP .
. el .q !$%&'()*,/:;<=>?@[\e]\(ha\(ga{|}\(ti .
.TP
.B FROM
Gives the first year in which the rule applies.
What do you think?
Regards,
Branden
[1] (Much UTF-8 follows.)
History
A consideration of the typefaces originally available to AT&T nroff
and troff illuminates many conventions that one might regard as
idiosyncratic fifty years afterward. (See section “History” of
roff(7) for more context.) The face used by the Teletype Model 37
terminals of the Murray Hill Unix Room was based on ASCII, but
assigned multiple meanings to several code points, as suggested by
that standard. Decimal 34 (") served as a dieresis accent and
neutral double quotation mark; decimal 39 (') as an acute accent,
apostrophe, and closing (right) single quotation mark; decimal 45
(-) as a hyphen and a minus sign; decimal 94 (^) as a circumflex
accent and caret; decimal 96 (`) as a grave accent and opening
(left) single quotation mark; and decimal 126 (~) as a tilde accent
and (with a half‐line motion) swung dash. The Model 37 bore an
optional extended character set offering upright Greek letters and
several mathematical symbols; these were documented as early as the
kbd(VII) man page of the (First Edition) Unix Programmer’s Manual.
At the time Graphic Systems delivered the C/A/T phototypesetter to
AT&T, the ASCII character set was not considered a standard basis
for a glyph repertoire by traditional typographers. In the stock
Times roman, italic, and bold styles available, several ASCII
characters were not present at all, nor was most of the Teletype’s
extended character set. AT&T commissioned a “special” font to
ensure no loss of repertoire.
A representation of the coverage of the C/A/T’s text fonts follows.
The glyph resembling an underscore is a baseline rule, and that
resembling a vertical line is a box rule. In italics, the box rule
was not slanted. We also observe that the hyphen and minus sign
were already “de‐unified” by the fonts provided; a decision whither
to map an input “-” therefore had to be taken.
┌────────────────────────────────────────────────────┐
│A B C D E F G H I J K L M N O P Q R S T U V W X Y Z │
│a b c d e f g h i j k l m n o p q r s t u v w x y z │
│0 1 2 3 4 5 6 7 8 9 fi fl ffi ffl │
│! $ % & ( ) ‘ ’ * + - . , / : ; = ? [ ] │ │
│• □ — ‐ _ ¼ ½ ¾ ° † ′ ¢ ® © │
└────────────────────────────────────────────────────┘
The special font supplied the missing ASCII and Teletype extended
glyphs, among several others. The plus, minus, and equals signs
appeared in the special font despite availability in text fonts “to
insulate the appearance of equations from the choice of standard
[read: text] fonts”—a priority since troff was turned to the task of
mathematical typesetting as soon as it was developed.
We note that AT&T took the opportunity to de‐unify the
apostrophe/right single quotation mark from the acute accent (a
choice ISO later duplicated in its 8859 series of standards). A
slash intended to be mirror‐symmetric with the backslash was also
included, as was the Bell System logo; we do not attempt to depict
the latter.
┌──────────────────────────────────────────────────────────┐
│α β γ δ ε ζ η θ ι κ λ μ ν ξ ο π ρ σ ς τ υ ϕ χ ψ ω │
│Γ Δ Θ Λ Ξ Π Σ Υ Φ Ψ Ω │
│" ´ \ ^ _ ` ~ / < > { } # @ + − = ∗ │
│≥ ≤ ≡ ≈ ∼ ≠ ↑ ↓ ← → × ÷ ± ∞ ∂ ∇ ¬ ∫ ∝ √ ‾ ∪ ∩ ⊂ ⊃ ⊆ ⊇ ∅ ∈ │
│§ ‡ ☜ ☞ | ○ ⎧ ⎩ ⎫ ⎭ ⎨ ⎬ ⎪ ⌊ ⌋ ⌈ ⌉ │
└──────────────────────────────────────────────────────────┘
One ASCII character as rendered by the Model 37 was apparently
abandoned. That device printed decimal 124 (|) as a broken vertical
line, like Unicode U+00A6 (¦). No equivalent was available on the
C/A/T; the box rule \[br], brace vertical extension \[bv], and “or”
operator \[or] were used as contextually appropriate.
Devices supported by AT&T device‐independent troff exhibited some
differences in glyph detail. For example, on the Autologic APS‐5
phototypesetter, the square \(sq became filled in the Times bold
face.
[The lowercase Greek letters in the last boxed table above render in
italics where feasible; it is not when pasting into a plain text email.]
signature.asc
Description: PGP signature