Re: [w32] display international HELLO

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [w32] display international HELLO

From:	Richard Wordingham
Subject:	Re: [w32] display international HELLO
Date:	Fri, 9 Nov 2007 08:37:42 -0000

On 31 January 2007, Takashi Hiromatsu wrote (archived athttp://lists.gnu.org/archive/html/emacs-devel/2007-01/msg01087.html ):

I'm tring to display all the language's "HELLO" on Emacs on Windows byusing original Microsoft true type fonts.
   --- Emacs/22.0.92 (i386-mingw-nt5.0.2195)

I succeed many of them by "Arial unicode MS" font exept 7 language listedbelow.:
   Amharic
   Arabic
   Braille
   Hindi
   Kannada
   Malayalam
   Tibetan

I wrote only font settings in my ~/.emacs shown below.
----------------------------------------------------------------------------
(add-to-list 'default-frame-alist '(font . "fontset-default"))

(set-fontset-font "fontset-default"
                  'mule-unicode-0100-24ff
                  '("Arial Unicode MS*" . "iso10646-1"))

<snip>

Off cource, "Amharic" and "Braille" can not be displayed by "Arial UnicodeMS", becuase it does not have. But I hope to see other 5 languages by it.

Is there any ways to display them?
Or should I use other fonts?

Hindi and Malayalam are a tougher problem. Although the basic text isencoded in mule-unicode-0100-24ff, 'composition' properties are actuallyspecified in the file. The composition property should provide renderabletext and mark-up which replace the basic text in the display, which ideallyshould be totally unnecessary in an MS Windows system. (Realising thisideal requires the ability to upgrade the Uniscribe library to cover extrascripts and even newly admitted characters in supported scripts.) Thesecompositions are defined by elements for the charset indian-glyph, and itscharacters have no specified Unicode equivalent. You need a non-Unicodefont to display these characters. Arial Unicode MS does not contain much inthe way of shaping tables, so it will not work properly for any of the truly'complex' Indic scripts. (This may be why Microsoft seems to have abandonedthis font.)

Tibetan and Lao also use the composition property, but in terms ofcharacters in the same charset. However, I'm having display problems forLao - see item 3 below. Tibetan won't display for me as I don't have a fontthat supports Tibetan.

I'm trying to understand how the input and display mechanisms of Emacs22.1.1 work on Windows XP - I'm particularly interested in Indic scripts.My machine is set up with Thai as its 'ANSI' character set. I'm seeing somerather bizarre behaviours, and I'm having difficulty understanding them.Once I realised that Emacs was not accepting Unicode input from thekeyboard, I tried to understand the built-in input methods. I investigatedLao input.

1. With the default font, the Windows keyboard set to Thai Kedmanee, Thaidisplays badly as it is typed. Bits of characters are left behind as thetyping position moves rightwards faster than it should. However, when Iswitch to Code2000, a font with a wide Unicode coverage, Thai displays aswell as it does with native products such as Notepad. This may be becausethe alleged default font, Courier New, has no Thai glyphs, and so glyphmetrics and glyphs bear no relationship to one another.

The Thai characters produced in this fashion are in one of the Unicodecharsets (mule-unicode-0100-24ff).

2. My first discovery with Lao was that just selecting a font (Code2000)that supported Lao was not enough. It would not normally display Laocharacters (in the Lao charset), until I discovered that a trick such as


(set-fontset-font "fontset-myfont" 'lao '("Code2000" . "iso10646-1"))

suddenly made the Lao text displayable. How does this work? I have studiedthe code of xdisp.c and its supporting functions, but I cannot find whereEmacs character codes are converted to Unicode. I did notice that if Ipasted Lao in from an MS application, Emacs would accept them as Unicodecharacters and they would be displayed properly if I selected an appropriatefont.

3. Compositions of Lao characters, (i.e. with the 'composition' stringproperty) using the Code2000 font (the only fully working Lao font I have),do not display properly, whether they are in the Lao ormule-unicode-0100-24ff charset. With the latter I have seen left-hand partsof Hangul syllables displayed instead of Lao! Perhaps when I understand howuncomposed display does work, I will be able to understand this problem. Atpresent I need to defeat the composition logic by typing consonant + vowelas <consonant, space, delete, vowel>! The text entered thus then displaysproperly, mocking the hard work that has gone into carefully composinggrapheme clusters.

4. When I explicitly specify that a buffer is to be saved in UTF-8 (or oneof its variants), the Lao input method suddenly switches from generating Laocharacters in the Lao charset to generating Lao characters in themule-unicode-0100-24ff charset. How is this effect achieved? I can't workit out. Characters already stored in the Lao charset remain in the Laocharset in the buffer, as confirmed by C-x C-e (eval-last-sexp).

Bizarrely, selecting UTF-16 as the encoding for saving the buffer does notchange the charset used by the Lao charset.

5. Possibly not news, but I have found that with a Uniscribe that supportsKhmer, Unicode-encoded Khmer text pasted in to Emacs displays properly,including 'Indic rearrangement'. As far as I can tell, Emacs 22.1 has nosupport for Khmer! (Cursor positioning does look wrong for Khmer.) When Iunderstand what is happening with Lao, I intend to write an input method forKhmer - unless I find Emacs on Windows has evolved to accepting UTF-16 asthe coding system for keyboard input.

6. Latin ligaturing does not work. 'Caesar' with a ZWJ between 'a' and 'e'does not ligate even using a font for which it does ligate in Notepad.Perhaps that can get swept up with the handling of Unicode viramas, i.e.Indic conjuncts.

Richard.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [w32] display international HELLO, Richard Wordingham <=
- Re: [w32] display international HELLO, Eli Zaretskii, 2007/11/09
- Re: [w32] display international HELLO, Kenichi Handa, 2007/11/09
  - Re: [w32] display international HELLO, Richard Wordingham, 2007/11/14
    - Re: [w32] display international HELLO, Kenichi Handa, 2007/11/18
    - Re: [w32] display international HELLO, Jason Rumney, 2007/11/19
    - Re: [w32] display international HELLO, Richard Wordingham, 2007/11/19
    - Re: [w32] display international HELLO, Jason Rumney, 2007/11/20
    - Re: [w32] display international HELLO, Kenichi Handa, 2007/11/20
    - Re: [w32] display international HELLO, Richard Wordingham, 2007/11/20

Prev by Date: List of platforms to delete (was: Re: isearch multiple buffers)
Next by Date: Re: patch: add-log.el: changelog find file under poin
Previous by thread: List of platforms to delete (was: Re: isearch multiple buffers)
Next by thread: Re: [w32] display international HELLO
Index(es):
- Date
- Thread