Re: an observation and proposal about hyphenation codes

groff

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: an observation and proposal about hyphenation codes

From:	G. Branden Robinson
Subject:	Re: an observation and proposal about hyphenation codes
Date:	Tue, 6 Aug 2024 15:52:47 -0500

Hi Dave,

At 2024-08-06T15:28:25-0500, Dave Kemper wrote:
> On Tue, Aug 6, 2024 at 1:34 PM G. Branden Robinson
> <g.branden.robinson@gmail.com> wrote:
> > The hyphenation language (`.hla`) and hyphenation mode (`.hy`) are
> > the same for these two scenarios.
> 
> Yes, sloppy wording on my part.  By "default hyphenation" I meant no
> aspect of it was changed by the input file.  Command-line switches of
> course had an effect.

Understood.

> > Therefore these characters did not acquire nonzero hyphenation
> > codes, and therefore were not valid hyphenation breakpoints.
> >
> > Does this make sense?
> 
> Yes.  It makes me wonder about the wisdom of commit 0629380a9's move
> of the .hcode blocks.  That is, I understand the reasoning for it you
> and Werner put forth, that the underlying groff design didn't
> contemplate a single run needing different languages' hyphenation
> support.

But it also didn't quite rule it out.  We have been generating a
document bearing this requirement since before the 1.23.0 release --
groff-man-pages.{pdf,utf8.txt}.  It switches from English to Swedish and
back to render groff_mmse(7).

You can observe the dance that we perform to achieve this in our
"doc" directory's Automake file.

https://git.savannah.gnu.org/cgit/groff.git/tree/doc/doc.am?h=1.23.0#n251

> But tying an initial hyphenation scheme to a language seems to at
> least tie it to the right thing at the outset, whereas tying it to an
> encoding perhaps doesn't.

There are two aspects to the hyphenation scheme, in this sense.

1.  which characters are letters in the given character encoding
2.  which letters behave exactly like other letters for hyphenation
    purposes in a given language

Point 1 is determined by the character encoding.  Point 2 is too, in
part, for case-folding purposes.

The remainder of point 2 would cover situations like "hyphenate 'n' just
like 'ñ', as Spanish hypothetically might.  However, to date, this
remainder has never been addressed by groff's hyphenation support.  It
could be--it just demands contributors with the requisite knowledge of
their language's hyphenation rules.

You may notice something unusual about "latin5.tmac" in Git HEAD:

.hcode İ i \" exceptional case; move to tr.tmac if we ever get one

...which, I'll grant, makes "point 1" more complicated again.  Most
languages don't change the lettercase mapping rules.  Most languages
aren't Turkish.

I guess I should add

.hcode I ı

too, huh?

> > If so, what I will do is make "en.tmac" `.mso latin1.tmac`.
> 
> That will solve the problem for English.  Are there other language
> files that will need it?

Every other groff localization file for a Western language -- almost --
`mso`s an encoding macro file already.

$ grep mso tmac/{cs,de,den,es,fr,it,ru,sv}.tmac | grep -v trans
tmac/cs.tmac:.mso latin2.tmac
tmac/de.tmac:.mso latin1.tmac
tmac/den.tmac:.do mso de.tmac
tmac/es.tmac:.mso latin9.tmac
tmac/fr.tmac:.mso latin9.tmac
tmac/ru.tmac:.mso koi8-r.tmac
tmac/sv.tmac:.mso latin1.tmac

I will therefore add

.mso latin1.tmac

to both "en.tmac" _and_ "it.tmac".

> Will some language files need other tmac/latin*.tmac sourced?

Yes, but they have them already, and in some cases for a long time.

$ git blame tmac/fr.tmac | grep 'mso.*latin'
fd7264f136 (Werner LEMBERG      2006-02-07 05:46:08 +0000 156) .mso latin9.tmac

Regards,
Branden

signature.asc
Description: PGP signature

[Prev in Thread]

Current Thread

[Next in Thread]

Re: an observation and proposal about hyphenation codes, Werner LEMBERG, 2024/08/01
- Re: an observation and proposal about hyphenation codes, Dave Kemper, 2024/08/06
  - Re: an observation and proposal about hyphenation codes, G. Branden Robinson, 2024/08/06
    - Re: an observation and proposal about hyphenation codes, Dave Kemper, 2024/08/06
    - Re: an observation and proposal about hyphenation codes, G. Branden Robinson, 2024/08/06
    - Re: an observation and proposal about hyphenation codes, Dave Kemper, 2024/08/06
    - Re: an observation and proposal about hyphenation codes, G. Branden Robinson <=
    - Re: an observation and proposal about hyphenation codes, Dave Kemper, 2024/08/06
    - Re: an observation and proposal about hyphenation codes, Sigfrid Lundberg, 2024/08/07
    - a bilingual English/French groff document (was: an observation and proposal about hyphenation codes), G. Branden Robinson, 2024/08/09

Prev by Date: Re: an observation and proposal about hyphenation codes
Next by Date: Re: an observation and proposal about hyphenation codes
Previous by thread: Re: an observation and proposal about hyphenation codes
Next by thread: Re: an observation and proposal about hyphenation codes
Index(es):
- Date
- Thread