[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Groff] Permissible characters for hyphenation
From: |
Steffen Nurpmeso |
Subject: |
Re: [Groff] Permissible characters for hyphenation |
Date: |
Mon, 30 May 2016 17:09:14 +0200 |
User-agent: |
s-nail v14.8.8-237-g587085a |
John Gardner <address@hidden> wrote:
|On 30 May 2016 at 23:20, Steffen Nurpmeso <address@hidden> wrote:
|> John Gardner <address@hidden> wrote:
|>|> I have been convinced that soft hyphen is a control character and
|>|> not something visual,
|>|
|>|Almost correct.
|>|
|>|Soft hyphens *do* describe potential breaking points, but they only
|> become
|>|visible when surrounding text is broken.
|> Yes. For display purposes however i think U+00AD can't be used
|> directly, but will be replaced by the renderer to either nothing,
|> if no wrap is to be applied at the character position, or
|> something appropriate, like ASCII hyphen-minus or some extended
|>|Web authors were encouraged to use the more semantic and reliable <wbr/>
|>|element <https://developer.mozilla.org/en-US/docs/Web/HTML/Element/wbr>
|>|instead.
|>
|> I am, for one, sure that the HTML standard committee will someday
|> manage to add markup for shitty baby napkins. The palms and
|> beaches of their happenings seem to promote this direction. ^.^
|
|I, uh, think something might've been lost in translation. :|
So. That made me search the web and i've found:
On UTF-8 encoded pages, <wbr> behaves like the U+200B ZERO-WIDTH
SPACE code point. In particular, it behaves like a Unicode bidi
BN code point, meaning it has no effect on bidi-ordering: <div
dir=rtl>123,<wbr>456</div> displays, when not broken on two
lines, 123,456 and not 456,123.
For the same reason, the <wbr> element does not introduce
a hyphen at the line break point. To make a hyphen appear only
at the end of a line, use the soft hyphen character entity
(­) instead.
This element [.] was officially defined in HTML5.
My opinion: HTML was derived from SGML as a strict abstraction of
content and form(atting).
But afaik HTML requires any conforming application to support
Unicode since quite a long time, so then why duplicating
behaviour? Is it because of «explicit is better»? So, then.
Fine. I also would use <span> above, but the bigger the choice,
the harder it is to choose (www.dict.cc).
Years ago i've read Korpela's rant on this topic, but Markus Kuhn
also has something nice to say:
The original HTML 2 specification [6] by Tim Berners-Lee et al.,
still wisely leaves the semantics of SOFT HYPHEN untouched with
the remark
NOTE - Use of the non-breaking space and soft hyphen indicator
characters is discouraged because support for them is not
widely deployed.
Unfortunately by HTML 4 [7], this had mutated into a complete
reinterpretation of the purpose of the SOFT HYPHEN, compared to
how it had been used over the past decade in output devices.
What was originally a graphical character had turned into an
invisible marker for a hyphenation opportunity:
[.]
This HTML 4 reinterpretation is essentially the semantics that
Unicode then adopted as well.
May the majority be with you.
--steffen