[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Rendering the em dash on the terminal
From: |
Jeff Conrad |
Subject: |
RE: Rendering the em dash on the terminal |
Date: |
Mon, 26 Aug 2024 20:39:15 -0700 |
> From: G. Branden Robinson <g.branden.robinson@gmail.com>
> Sent: Monday, 26 August, 2024 5:34 PM
> Good to hear from you! As the new guy, it's always nice for me when a
> veteran groff maven chimes in.
Veteran, perhaps, because of age, but rusty in recent years ...
> (Veteran groff detractors, not so much. 😅)
>
> [CCing you just in case; if you'd prefer I didn't, please say so.]
>
Aesthetics
==========
> > Dunno if taking up two character cells makes it “look more like a
> > true em dash”;
>
> It does on my terminal, xterm using Liberation Sans Mono.
>
> See attachment.
I get similar results with Consolas on a Windows console. It
looks more like a real em dash in that it’s wider than one cell
(an en?). Still dunno whether it really looks more like a real
em dash. Different is never the same, and monospace fonts are
inherently poor substitutes for the real thing. There is no
substitute for cubic inches!
> The problem I observed is that an em dash should be close to
> one em wide--one em properly considered, that is, as wide as an
> em quadi, or as wide as a capital letter is from its top to its
> baseline. Ordinary or "halfwidth" character cell fonts simply
> don't look like that.
If we consider monospace fonts “halfwidth” (or at least half
something), ‘——’ probably does look like a true em dash. But is
“halfwidth” meaningful outside of CJK?
> > Dash List
> > ---------
> In groff 1.24, if you redefine the `EM` string, you'll get
> whatever dash you want there.
I was unaware that this hasn’t been the case; I checked the AT&T
mmn and mmt files from years ago, and--sure enough--DL uses ‘em’.
This might offer a way of having a different character for a dash
list than elsewhere, but it would eschew the mm tradition of
always using “\*(EM”, whose purpose was to give ‘\(em’ with troff
and ‘--’ with nroff. And what do we do if ‘\(em’ is already
changed to be two em dashes?
Clarity
=======
> The fonts the LWN editor uses seem to render all dash-like
> symbols the same.
>
> https://lwn.net/Articles/948720/
Certainly not the case with any of my editors, though the
distinctions are slight.
> > reasonable rule is that recognition should fail gracefully.
> > Chicago style would use “Anti–Police Terror Project”; suffice
> > it to say that the failure here is less than graceful.
> Might be time to resurrect data transfers over FTP.
I was thinking more of human than data-transmission failures ...
In typeset,
“Anti–Police Terror Project”
would be easily distinguished from
“Anti-Police Terror Project”
but even then, the average person--who probably wouldn’t know an
en dash if it bit them--would read the two as if they were
identical. And for many, the same may be true for an em dash.
Don’t get me going ...
> > Any approach that has an em dash take up two character cells
> > might lead to confusion in a few instances.
>
> Possibly. It _is_ a hazard, but a minor one more than offset by the
> benefit in clarity. My opinion.
Could well be.
> > Two-Em Dash
> > -----------
> > Three-Em Dash
> > -------------
> >
> > I suppose a workaround might be terminal-specific characters like
> > ‘2m’ and ‘3m’. I long had these as strings, more for ease of
> > entry than for handling different devices. In this case, though,
> > it’s not clear how these characters would be handled so there are
> > clear distinctions among ‘em’, ‘2m’, and ‘3m’. And if the
> > typographical convention of ‘--’ were to prevail for ‘em’, I’m
> > not sure how it would apply to ‘2m’ and ‘3m’.
>
> I despair of cutting these knots. For these relatively persnickety
> matters I think I would prefer to trust the document author to define
> strings and exercise formatter facilities to achieve the precise result
> they desire.
You have more faith than I ... I fear the same result as when
people decided we no longer needed parity bits, freeing up the G1
area for additional characters: everyone had a different idea of
what should go where. That iconv(1) exists seems a testament to
pervasive idiocy.
Comments
========
> > This seems reasonable. Most folks can probably figure this
> > out after a bit of head scratching, but it would be nice to
> > spare them the trouble.
>
> I certainly can add something here.
I think this would help. And it might help to mention it
elsewhere for (most) folks who will never look at the code or the
commit.
Copy and Paste
==============
> > How often would someone copy and paste from man(1) output?
>
> I do this frequently.
>
> https://lists.gnu.org/archive/html/groff/2024-07/msg00062.html
I guess I stand corrected 😊.
> If you have a typesetting device (or file format), use it!
Amen! Kinda why troff (and, with it, Unix) was developed.
> This is the man2html story all over again. Most people produce
> online man pages by scraping and (crudely) transforming
> grotty(1) output. That makes me sad. One of my long-term
> goals in groff development is to get people to stop maintaining
> these scraper-converters by offering an alternative that they
> struggle _not_ to prefer.
man Page Format
===============
> > Full disclosure: I format my man pages as PDF, so I may not be
> > the best person to comment on the appearance of output to
> > monospace device.
>
> Thank you for exercising this pathway. Deri James and I put a lot of
> work into groff 1.23 to make it nice, and further work into the
> forthcoming 1.24 to make it even better.
Next step: a man command that serves up PDF versions if present
in the appropriate places (I have a crude version that does just
that for my man pages as well as quite a few others, and
additionally, PDF versions of Texinfo documentation.
Untold effort has gone into troff, groff, Texinfo, and others so
that we can do better than a Teletype 33.
Searches
========
> When staring at a Unicode terminal, it's a bad idea to assume
> one knows what character is there based on its appearance.
Often, yes.
> Search this email for 'A'. Now search for 'Α'. But I repeat myself.
>
> Or do I?
In most cases, I think context would suggest the best search.
Would I mix Engrish and Greek? Not with my language skills ...
> If we're making a bad situation worse, it's by only a small
> margin, and the visual clarity in the face of rotten fonts
> again, I think, outweighs the argument against.
Eroff
=====
> I have seen very little on the Internet about eroff, and it
> also seems to be lost software with no extant source (or even
> binaries?). If you would take some time to jot down
> observations about it, that would be helpful to the posterity
> of this community.
I can probably provide a few tidbits, but I’ll need to rely
heavily on memory.
Softquad
========
> Even sqtroff seems nearly forgotten in spite of its major role
> in getting groff off the ground.
I never got around to trying this ($$$), though it was on my wish
list, especially after eroff departed the scene.
Strange Strings
===============
> > .ds EM \%\^\v'-.43m'_\h'-\w'_'u/2u'_\h'-3u*\w'_'u/2u'\h'1m'\h'-
> \w'_'u'_\v'.43m'\^
> . . .
> apparently eroff would not break a sequence with an
> > unclosed vertical motion.
>
> Interesting. When I get some round tuits I should find out if
> GNU troff will, and if it's worth keeping it from doing so.
Suffice it to say that my definition of EM was bespoke and born
of desperation. Once I was able to create custom soft fonts that
included an em dash, this was a nonissue. But as you know, you
go to war with the army you have. It’s not the army you might
want or wish to have at a later time.
Hyphenation Control
===================
> > The leading ‘\%’ was added for good measure; I can’t remember proving
> > whether it actually helped.
> [...]
>
> `\%` has recently annoyed me with its ambiguity.
>
> https://lists.gnu.org/archive/html/groff/2024-03/msg00208.html
What this seems to say is that ‘\%’ in the middle of a word
appears to have priority. With sensible coding like
\%antidisestablishmentarianism
.br
\&\%antidisestablishmentarianism
things seem to work as expected.
> https://lists.gnu.org/archive/html/groff/2024-04/msg00000.html
This seems to suggest that ‘\%’ in the middle of a word gives the
first hyphenation point but does not preclude others.
I’m not sure there’s anything wrong with this, but without
mention in the documentation, I agree it’s ambiguous.
Convention, Again
=================
> > So ultimately, I dunno. For the most common usages, ‘——’ may be
> > aesthetically preferable to ‘--’. But in some less common
> > situations, this may confuse more than enhance. I think it’s
> > worth hearing what others think.
>
> For man pages, the mapping can be altered (or removed) in the
> "man.local" and "mdoc.local" files.
>
> More generally, it could be dealt with in the "troffrc-end" file.
Lots ways to do this for the sophisticated user. But I think the
needs of the average user should take priority.
In closing, it should be noted that the CP1252 device (of which I
probably have the only instance on the planet) also has an em
dash. I render it as such, and agree that it’s not great--so I
guess I need to decide whether to go with ‘--’ or ‘——’. And
whether to do it in the device or in tty.tmac.
Jeff