groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Rendering the em dash on the terminal


From: Jeff Conrad
Subject: RE: Rendering the em dash on the terminal
Date: Mon, 26 Aug 2024 20:39:15 -0700

> From: G. Branden Robinson <g.branden.robinson@gmail.com>
> Sent: Monday, 26 August, 2024 5:34 PM

> Good to hear from you!  As the new guy, it's always nice for me when a
> veteran groff maven chimes in.

Veteran, perhaps, because of age, but rusty in recent years ...

> (Veteran groff detractors, not so much. 😅)
> 
> [CCing you just in case; if you'd prefer I didn't, please say so.]
> 
Aesthetics
==========
> > Dunno if taking up two character cells makes it “look more like a
> > true em dash”;
> 
> It does on my terminal, xterm using Liberation Sans Mono.
> 
> See attachment.

I get similar results with Consolas on a Windows console.  It
looks more like a real em dash in that it’s wider than one cell
(an en?).  Still dunno whether it really looks more like a real
em dash.  Different is never the same, and monospace fonts are
inherently poor substitutes for the real thing.  There is no
substitute for cubic inches!

> The problem I observed is that an em dash should be close to
> one em wide--one em properly considered, that is, as wide as an
> em quadi, or as wide as a capital letter is from its top to its
> baseline.  Ordinary or "halfwidth" character cell fonts simply
> don't look like that.

If we consider monospace fonts “halfwidth” (or at least half
something), ‘——’ probably does look like a true em dash.  But is
“halfwidth” meaningful outside of CJK?

> > Dash List
> > ---------
> In groff 1.24, if you redefine the `EM` string, you'll get
> whatever dash you want there.

I was unaware that this hasn’t been the case; I checked the AT&T
mmn and mmt files from years ago, and--sure enough--DL uses ‘em’.
This might offer a way of having a different character for a dash
list than elsewhere, but it would eschew the mm tradition of
always using “\*(EM”, whose purpose was to give ‘\(em’ with troff
and ‘--’ with nroff.  And what do we do if ‘\(em’ is already
changed to be two em dashes?

Clarity
=======
> The fonts the LWN editor uses seem to render all dash-like
> symbols the same.
>
> https://lwn.net/Articles/948720/

Certainly not the case with any of my editors, though the
distinctions are slight.

> > reasonable rule is that recognition should fail gracefully.
> > Chicago style would use “Anti–Police Terror Project”; suffice
> > it to say that the failure here is less than graceful.

> Might be time to resurrect data transfers over FTP.

I was thinking more of human than data-transmission failures ...
In typeset,

    “Anti–Police Terror Project”

would be easily distinguished from

    “Anti-Police Terror Project”

but even then, the average person--who probably wouldn’t know an
en dash if it bit them--would read the two as if they were
identical.  And for many, the same may be true for an em dash.
Don’t get me going ...

> > Any approach that has an em dash take up two character cells
> > might lead to confusion in a few instances.
> 
> Possibly.  It _is_ a hazard, but a minor one more than offset by the
> benefit in clarity.  My opinion.

Could well be.

> > Two-Em Dash
> > -----------
> > Three-Em Dash
> > -------------
> >
> > I suppose a workaround might be terminal-specific characters like
> > ‘2m’ and ‘3m’.  I long had these as strings, more for ease of
> > entry than for handling different devices.  In this case, though,
> > it’s not clear how these characters would be handled so there are
> > clear distinctions among ‘em’, ‘2m’, and ‘3m’.  And if the
> > typographical convention of ‘--’ were to prevail for ‘em’, I’m
> > not sure how it would apply to ‘2m’ and ‘3m’.
> 
> I despair of cutting these knots.  For these relatively persnickety
> matters I think I would prefer to trust the document author to define
> strings and exercise formatter facilities to achieve the precise result
> they desire.

You have more faith than I ...  I fear the same result as when
people decided we no longer needed parity bits, freeing up the G1
area for additional characters: everyone had a different idea of
what should go where.  That iconv(1) exists seems a testament to
pervasive idiocy.

Comments
========
> > This seems reasonable.  Most folks can probably figure this
> > out after a bit of head scratching, but it would be nice to
> > spare them the trouble.
>
> I certainly can add something here.

I think this would help.  And it might help to mention it
elsewhere for (most) folks who will never look at the code or the
commit.

Copy and Paste
==============
> > How often would someone copy and paste from man(1) output?
> 
> I do this frequently.
> 
> https://lists.gnu.org/archive/html/groff/2024-07/msg00062.html

I guess I stand corrected 😊.

> If you have a typesetting device (or file format), use it!

Amen! Kinda why troff (and, with it, Unix) was developed.

> This is the man2html story all over again.  Most people produce
> online man pages by scraping and (crudely) transforming
> grotty(1) output.  That makes me sad.  One of my long-term
> goals in groff development is to get people to stop maintaining
> these scraper-converters by offering an alternative that they
> struggle _not_ to prefer.

man Page Format
===============
> > Full disclosure: I format my man pages as PDF, so I may not be
> > the best person to comment on the appearance of output to
> > monospace device.
> 
> Thank you for exercising this pathway.  Deri James and I put a lot of
> work into groff 1.23 to make it nice, and further work into the
> forthcoming 1.24 to make it even better.

Next step: a man command that serves up PDF versions if present
in the appropriate places (I have a crude version that does just
that for my man pages as well as quite a few others, and
additionally, PDF versions of Texinfo documentation.

Untold effort has gone into troff, groff, Texinfo, and others so
that we can do better than a Teletype 33.

Searches
========
> When staring at a Unicode terminal, it's a bad idea to assume
> one knows what character is there based on its appearance.

Often, yes.

> Search this email for 'A'.  Now search for 'Α'.  But I repeat myself.
> 
> Or do I?

In most cases, I think context would suggest the best search.
Would I mix Engrish and Greek?  Not with my language skills ...

> If we're making a bad situation worse, it's by only a small
> margin, and the visual clarity in the face of rotten fonts
> again, I think, outweighs the argument against.

Eroff
=====
> I have seen very little on the Internet about eroff, and it
> also seems to be lost software with no extant source (or even
> binaries?).  If you would take some time to jot down
> observations about it, that would be helpful to the posterity
> of this community.

I can probably provide a few tidbits, but I’ll need to rely
heavily on memory.

Softquad
========
> Even sqtroff seems nearly forgotten in spite of its major role
> in getting groff off the ground.

I never got around to trying this ($$$), though it was on my wish
list, especially after eroff departed the scene.

Strange Strings
===============
> > .ds EM \%\^\v'-.43m'_\h'-\w'_'u/2u'_\h'-3u*\w'_'u/2u'\h'1m'\h'-
> \w'_'u'_\v'.43m'\^
>  .  .  .
> apparently eroff would not break a sequence with an
> > unclosed vertical motion.
> 
> Interesting.  When I get some round tuits I should find out if
> GNU troff will, and if it's worth keeping it from doing so.

Suffice it to say that my definition of EM was bespoke and born
of desperation.  Once I was able to create custom soft fonts that
included an em dash, this was a nonissue.  But as you know, you
go to war with the army you have.  It’s not the army you might
want or wish to have at a later time.

Hyphenation Control
===================
> > The leading ‘\%’ was added for good measure; I can’t remember proving
> > whether it actually helped.
> [...]
> 
> `\%` has recently annoyed me with its ambiguity.
> 
> https://lists.gnu.org/archive/html/groff/2024-03/msg00208.html

What this seems to say is that ‘\%’ in the middle of a word
appears to have priority. With sensible coding like

\%antidisestablishmentarianism
.br
\&\%antidisestablishmentarianism

things seem to work as expected.

> https://lists.gnu.org/archive/html/groff/2024-04/msg00000.html

This seems to suggest that ‘\%’ in the middle of a word gives the
first hyphenation point but does not preclude others.

I’m not sure there’s anything wrong with this, but without
mention in the documentation, I agree it’s ambiguous.

Convention, Again
=================
> > So ultimately, I dunno.  For the most common usages, ‘——’ may be
> > aesthetically preferable to ‘--’.  But in some less common
> > situations, this may confuse more than enhance.  I think it’s
> > worth hearing what others think.
> 
> For man pages, the mapping can be altered (or removed) in the
> "man.local" and "mdoc.local" files.
> 
> More generally, it could be dealt with in the "troffrc-end" file.

Lots ways to do this for the sophisticated user.  But I think the
needs of the average user should take priority.

In closing, it should be noted that the CP1252 device (of which I
probably have the only instance on the planet) also has an em
dash.  I render it as such, and agree that it’s not great--so I
guess I need to decide whether to go with ‘--’ or ‘——’.  And
whether to do it in the device or in tty.tmac.

Jeff




reply via email to

[Prev in Thread] Current Thread [Next in Thread]