emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [w32] display international HELLO


From: Richard Wordingham
Subject: Re: [w32] display international HELLO
Date: Tue, 20 Nov 2007 01:49:14 -0000

Kenichi Handa wrote:


Richard Wordingham writes:

3. Compositions of Lao characters, (i.e. with the 'composition' string
property) using the Code2000 font (the only fully working Lao font I
have),
do not display properly, whether they are in the Lao or
mule-unicode-0100-24ff charset.

I'm going to allow each font-backends to generate proper
composition information that will vary depending on a font,
instead of the current fixed way of composition.  So, On
Windows, perhaps the font backend can utilize uniscribe.

For OpenType fonts in scripts supported by Uniscribe, that's generally the way to go - especially for quick results. Might Pango be superior, even on MS Windows, though? It was very noticeable that when Unicode belatedly added U+0BB6 TAMIL LETTER SHA, Uniscribe refused to treat it as a Tamil letter, let alone form the shri ligature from it in those fonts that had been updated. (Previously the shri ligature had been implemented via the hack of using U+0BB7 TAMIL LETTER SSA instead.)

There is another composition technology around, intended to cater for those scripts not or inadequately supported by Uniscribe, namely Graphite from SIL. For some time it was the only way of supporting the Burmese script in Unicode on Windows. (I don't know if Windows Vista and related products support the Burmese script, at least for Burmese. I'd be impressed if the Shan extensions were in.) The OpenType font has extra tables for Graphite, so an application (such as at least some versions of Firefox and OpenOffice) knows whether to use Graphite or Uniscribe/Pango for its GSUB and GPOS tables. (I presume similar considerations apply to Apple-defined mort and morx tables.) By putting the composition knowledge in the font, Graphite even allows one to encode complex scripts in the Private Use Areas.

Incidentally, part of the reason for the poor Lao rendering was that in Emacs 22.1 on MS Windows the font was being treated as encoded by an 'ANSI' sequence. I've fixed that problem by adding some MS Windows only code to append_composite_glyph() in xdisp.c to apply the identification rules in the same way as done for uncomposed characters, but that doesn't really seem the best place for it. Populating and using the unused field font_type in W32FontStruct would be a clearer solution. (A cleaner solution still would be to always use ExtTextOutW instead of ExtTextOutA - Emacs 22.1 always generates an intermediate sequence of 16-bit codes, but the burden of recoding for hack fonts might be transferred from the OS to emacs.) Judging by the outputs, I think this bug is still present in Emacs 23.0.60.0 (if I can trust version.el). Most spectacularly, plain text 'underlined' 'o' <U+006F U+0331> renders as 'o' with the digit '1' written below it!

This then exposes the next set of problems - Uniscribe often refuses to draw a combining mark on its own (prefixing U+00A0 might work) - and determining when a composition should be left to Uniscribe. The latter is slightly complicated by such features as an ASCII or Latin-1 base character plus a combining mark, admittedly fairly rare if one is using Normal Form Composed (NFC). (Indic transliteration and typewriter-based American Indian orthographies are the best sources, e.g. underlining for nasal vowels in Choctaw.) In these cases, the character sequence is broken, at least in Emacs 22.1, because the base and combining characters seem to come from different fonts!

I'm tempted to go for the brute force rule of assuming that the combining marks are always taken from the same OpenType font as the base character and giving the job to Uniscribe. This hits the practical problem that many OpenType fonts don't stack arbitrary combinations of diacritic marks. However, I have seen an Emacs-related statement that it is the user's responsibility to provide a font that works properly.

Richard.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]