bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#10299: Emacs doesn't handle Unicode characters in keyboard layout on


From: Joakim Hårsman
Subject: bug#10299: Emacs doesn't handle Unicode characters in keyboard layout on MS Windows
Date: Thu, 15 Dec 2011 08:53:15 +0100

On 15 December 2011 07:22, Eli Zaretskii <eliz@gnu.org> wrote:
>> Date: Wed, 14 Dec 2011 21:39:28 +0100
>> From: Joakim Hårsman <joakim.harsman@gmail.com>
>>
>> However, Emacs doesn't seem to handle the case when the keyboard
>> layout contains characters not available in the ANSI code page, and
>> just prints a question mark character instead.
>
> Yes, Emacs on Windows uses the ANSI codepage to read the keyboard
> input.  Does it help to play with the value of keyboard-coding-system?

No, changing keyboard-coding-system doesn't help, and utf-16le-dos
isn't a valid setting for keyboard-coding-system anyway.

>> For certain characters,
>> a character that is visually similar to the actual character is
>> printed instead of a question mark. For example, if I use a layout
>> where AltGr+O produces U+2218 RING OPERATOR, Emacs prints U+00B0
>> DEGREE SYMBOL instead. The degree symbol is available in Windows 1252,
>> the default ANSI code page on my system, but the ring operator
>> isn't.
>
> I'm guessing that this is Windows trying to translate the characters
> to the ANSI codepage behind the scenes.
>
>> However, if the layout maps AltGr+R to U+0220A SMALL ELEMENT OF, Emacs
>> just prints a question mark, presumably because Windows 1252 doesn't
>> contain a reasonable replacement for that character.
>
> Will inputting these characters with "C-x 8 RET 0220a RET" or "C-x 8
> RET SMALL ELEMENT OF RET" be a good enough solution for you?  You can
> input any Unicode character by its name or codepoint using "C-x 8 RET".

Using C-x 8 is too cumbersome. I guess I could write my own custom
Emacs input method, but since Emacs now has good support for Unicode,
it would seem easier if it handled Unicode key events from the OS
correctly.

>> I'd be happy to help debug this but I have no idea where to even
>> start. Is there an easy way to find out if it's the C code that
>> clobbers the character or if it happens in lisp for example?
>
> I don't think there any "clobbering".  Emacs deliberately converts the
> Unicode characters to the current locale's ANSI codepage.  I think
> (but I'm not sure) the reason is that Emacs cannot use UTF-16 for
> keyboard input.  Perhaps Jason and Handa-san could comment on this.

I really don't know my way around the Emacs source, but a quick look
at w32_kbd_patch_key in w32inevt.c seems to indicate that Emacs really
is decoding the Unicode character event correctly, both
uChar.UnicodeChar and uChar.AsciiChar seem to be set correctly.

  /* On NT, call ToUnicode instead and then convert to the current
     locale's default codepage.  */
  if (os_subtype == OS_NT)
    {
      WCHAR buf[128];

      isdead = ToUnicode (event->wVirtualKeyCode, event->wVirtualScanCode,
                          keystate, buf, 128, 0);
      if (isdead > 0)
        {
          char cp[20];
          int cpId;

          event->uChar.UnicodeChar = buf[isdead - 1];

          GetLocaleInfo (GetThreadLocale (),
                         LOCALE_IDEFAULTANSICODEPAGE, cp, 20);
          cpId = atoi (cp);
          isdead = WideCharToMultiByte (cpId, 0, buf, isdead,
                                        ansi_code, 4, NULL, NULL);
        }
      else
        isdead = 0;
    }

However, this bit from w32_wnd_proc in w32fns.c looks suspicious to me:

                  else
                    {
                      /* Try to handle other keystrokes by determining the
                         base character (ie. translating the base key plus
                         shift modifier).  */
                      int add;
                      KEY_EVENT_RECORD key;

                      key.bKeyDown = TRUE;
                      key.wRepeatCount = 1;
                      key.wVirtualKeyCode = wParam;
                      key.wVirtualScanCode = (lParam & 0xFF0000) >> 16;
                      key.uChar.AsciiChar = 0;
                      key.dwControlKeyState = modifiers;

                      add = w32_kbd_patch_key (&key);
                      /* 0 means an unrecognized keycode, negative means
                         dead key.  Ignore both.  */
                      while (--add >= 0)
                        {
                          /* Forward asciified character sequence.  */
                          post_character_message
                            (hwnd, WM_CHAR,
                             (unsigned char) key.uChar.AsciiChar, lParam,
                             w32_get_key_modifiers (wParam, lParam));
                          w32_kbd_patch_key (&key);
                        }
                      return 0;
                    }

It looks like it's re-posting the event with just the Ascii key code,
clobbering the Unicode info that's originally in wParam. Or maybe the
idea is to translate characters that require multiple bytes into
multiple events?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]