Re: [Groff] Character class query

groff

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] Character class query

From:	Werner LEMBERG
Subject:	Re: [Groff] Character class query
Date:	Mon, 02 Mar 2009 15:53:40 +0100 (CET)

> I'm working on resurrecting Brian M. Carlson's work on character
> classes, and attempting to update it in the light of Werner's
> comments in the short thread on the subject in January 2008.

Great!

> I have a query about my planned design that I'd like to run past you
> first, though. After playing about with a few possibilities, I noted
> that, at least in theory, we would want to be able to apply several of
> the same sets of attributes to character classes as we do to individual
> groff entities: [...]

Yes.

> [...] it seems sensible to simply put character classes in the same
> symbol table as ordinary groff entities, and add character-range and
> class-nesting support to 'class charinfo'.

Good idea.  It's so simple that noone has had this idea before.

> Obviously a class that consisted of more than just a single
> character wouldn't have a Unicode codepoint or a glyph number or
> anything, and \[CJKprepunct] wouldn't produce any output, but
> '.cflags 2 \[CJKprepunct]' or whatever would be a sensible thing to
> write.

We could introduce a naming convention for character classes, say, to
start such names with a dot, having the word `class' in its name, or
something similar.  Since the list of groff entities is not
extensible, we have a broad range of possibilities.  We could even use
names similar to POSIX character ranges, e.g.,

  .char \C'[:digit:]' 0123456789
  abc\C'[:digit:]'abc

Note that entities with a `]' in its name can't be accessed with
\[...]; this might work as an additional protection against accidental
misuse.

> A simple initial implementation would essentially just change the
> accessor methods of 'class charinfo' to look through all registered
> character classes for ones that include the current character
> (intentionally vague here as I haven't yet worked out how to deal
> with ranges of Unicode codepoints that haven't been given entity
> indices).

This should probably support fall-back classes too, similar to the
current mechanism for ordinary entities.

> For a small number of classes this ought to be perfectly adequate,
> and the lookups can be optimised later. My immediate needs (CJK
> support, of course!) only seem to require classes for
> no-break-before and no-break-after kinsoku shori, a general notion
> of "CJK character" so that we can adjust kerning between CJK and
> Latin characters,

This notion is also necessary to indicate that a break after the
current CJK character is allowed.

> and a class for double-width characters.

Not on the input side.

> I assume that the latter two would need to be done on the font side,

Exactly.

> BTW, in light of Werner's comments that glyphs are strictly an output
> notion, it isn't half confusing that 'class charinfo' is based on
> 'struct glyph' ...

Well, those names are historical, and while James Clark implemented
the character/glyph separation quite cleanly, he doesn't paid much
attention to proper structure and class names.


    Werner

[Prev in Thread]

Current Thread

[Next in Thread]

[Groff] Character class query, Colin Watson, 2009/03/01
- Re: [Groff] Character class query, Werner LEMBERG <=
  - Re: [Groff] Character class query, Colin Watson, 2009/03/02
    - Re: [Groff] Character class query, Werner LEMBERG, 2009/03/05

Prev by Date: [Groff] pdfmark - section numbers
Next by Date: Re: [Groff] Character class query
Previous by thread: [Groff] Character class query
Next by thread: Re: [Groff] Character class query
Index(es):
- Date
- Thread