[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Groff] Character class query
From: |
Werner LEMBERG |
Subject: |
Re: [Groff] Character class query |
Date: |
Mon, 02 Mar 2009 15:53:40 +0100 (CET) |
> I'm working on resurrecting Brian M. Carlson's work on character
> classes, and attempting to update it in the light of Werner's
> comments in the short thread on the subject in January 2008.
Great!
> I have a query about my planned design that I'd like to run past you
> first, though. After playing about with a few possibilities, I noted
> that, at least in theory, we would want to be able to apply several of
> the same sets of attributes to character classes as we do to individual
> groff entities: [...]
Yes.
> [...] it seems sensible to simply put character classes in the same
> symbol table as ordinary groff entities, and add character-range and
> class-nesting support to 'class charinfo'.
Good idea. It's so simple that noone has had this idea before.
> Obviously a class that consisted of more than just a single
> character wouldn't have a Unicode codepoint or a glyph number or
> anything, and \[CJKprepunct] wouldn't produce any output, but
> '.cflags 2 \[CJKprepunct]' or whatever would be a sensible thing to
> write.
We could introduce a naming convention for character classes, say, to
start such names with a dot, having the word `class' in its name, or
something similar. Since the list of groff entities is not
extensible, we have a broad range of possibilities. We could even use
names similar to POSIX character ranges, e.g.,
.char \C'[:digit:]' 0123456789
abc\C'[:digit:]'abc
Note that entities with a `]' in its name can't be accessed with
\[...]; this might work as an additional protection against accidental
misuse.
> A simple initial implementation would essentially just change the
> accessor methods of 'class charinfo' to look through all registered
> character classes for ones that include the current character
> (intentionally vague here as I haven't yet worked out how to deal
> with ranges of Unicode codepoints that haven't been given entity
> indices).
This should probably support fall-back classes too, similar to the
current mechanism for ordinary entities.
> For a small number of classes this ought to be perfectly adequate,
> and the lookups can be optimised later. My immediate needs (CJK
> support, of course!) only seem to require classes for
> no-break-before and no-break-after kinsoku shori, a general notion
> of "CJK character" so that we can adjust kerning between CJK and
> Latin characters,
This notion is also necessary to indicate that a break after the
current CJK character is allowed.
> and a class for double-width characters.
Not on the input side.
> I assume that the latter two would need to be done on the font side,
Exactly.
> BTW, in light of Werner's comments that glyphs are strictly an output
> notion, it isn't half confusing that 'class charinfo' is based on
> 'struct glyph' ...
Well, those names are historical, and while James Clark implemented
the character/glyph separation quite cleanly, he doesn't paid much
attention to proper structure and class names.
Werner