bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#19994: 25.0.50; Unicode keyboard input on Windows


From: Ilya Zakharevich
Subject: bug#19994: 25.0.50; Unicode keyboard input on Windows
Date: Tue, 3 Mar 2015 15:09:49 -0800
User-agent: Mutt/1.5.21 (2010-09-15)

I’m working on a patch to make Unicode keyboard input to work properly on
Windows (in graphic mode).  The problems with the current implementation 
stem from the facts that

  • on Windows, it IS possible to implement a bullet-proof system of Unicode
    input (at least, for GUI applications);

  • However, how to do it is completely undocumented.

      [See
        
http://search.cpan.org/~ilyaz/UI-KeyboardLayout/lib/UI/KeyboardLayout.pm#Keyboard_input_on_Windows:_interaction_of_applications_and_the_kernel
      ]

So, essentially, all developers of applications try to design their own 
set of heuristical approaches which 

  • cover several keyboard layouts they can put their hands on;

  • more or less follow the design goals of their applications.

The approach taken by Emacs is to break the keyboard keys (VK’s) into 
several groups, and treat different groups differently.  Only the keys on
the main island of the keyboard may input characters.  Moreover, only 
the most common combinations of modifiers are allowed to be used for
the character input.  (In addition, there are plain bugs — like treating
UTF-16 as if it were UTF-32.)

  [I gave a very terse description on
     
https://groups.google.com/forum/?hl=en#!search/emacs$20keyboard$20windows$20ilya/gnu.emacs.help/ZHpZK2YfFuo/aAyZFUxrFeEJ
  ]

The “correct” approach should proceed in exactly the opposite direction:
if a keypress produces a character, it should be treated as a 
character — no matter where on the physical keyboard the key is residing,
and which modifiers were pressed.

The patch below

  • Implements this “primacy of characters” doctrine;
  
  • As far as I could see, is compatible with the current work of Emacs
    on “simple keyboard layouts”;
  
  • Worked at some moment (before I started a massive addition of 
    comments ;-] — and maybe it is still working, I did not touch it for a
    month);
  
  • (Currently) ignores the indent coding rules;
  
  • Passes all the test thrown at it by my super-puper-all-bells-and-whistles
    layouts; see e.g.
       http://k.ilyaz.org/windows/izKeys-visual-maps.html#examples
  
  • Is not bullet-proof: 
      ∘ I use one heuristic to detect which modifiers are “consumed” by the
        character input, and which are “on top” of character input;

      ∘ It does not (same as the current Emacs) support 
          Unicode-entered-by-Alt-numbers.
  
  • Does not fix a bug with UTF-16 of stand-alone (pumped to us) WM_CHAR’s.

If I ever find more time to work on it, I plan to:

  1) Add yet more documentation;

  2) Change a little bit the logic of detection of consumed/extra 
     modifiers.  This change may be cosmetic only — or maybe, with some 
     extremely devilous layouts, it may be beneficial.
     
     (I have not seen layouts where this change would matter, though!
      And I looked though the source code of hundred(s).)

  3) Bring it in sync with the Emacs coding style.

Meanwhile, I would greatly appreciate all input related to the current 
state of the patch.  (I *HOPE* that I did not break (many!) special cases
in the current implementation — but such things are hard to be sure in!)

Thanks for the parts of Emacs which ARE working great,
Ilya

=======================================================

--- w32fns.c-ini        2015-01-30 15:33:23.505201400 -0800
+++ w32fns.c    2015-02-15 02:46:12.070091800 -0800
@@ -2832,6 +2832,126 @@ post_character_message (HWND hwnd, UINT
   my_post_msg (&wmsg, hwnd, msg, wParam, lParam);
 }
 
+static int
+get_wm_chars (HWND aWnd, int *buf, int buflen, int ignore_ctrl, int ctrl, int 
*ctrl_cnt, int *is_dead, int vk, int exp)
+{
+  MSG msg;
+  int i = buflen, doubled = 0, code_unit;      /* If doubled is at the end, 
ignore it */
+  if (ctrl_cnt)
+    *ctrl_cnt = 0;
+  if (is_dead)
+    *is_dead = -1;
+  while (buflen &&                             /* Should be called only when 
w32_unicode_gui */
+         PeekMessageW(&msg, aWnd, WM_KEYFIRST, WM_KEYLAST, PM_NOREMOVE | 
PM_NOYIELD) &&
+         (msg.message == WM_CHAR || msg.message == WM_SYSCHAR || 
+          msg.message == WM_DEADCHAR || msg.message == WM_SYSDEADCHAR || 
msg.message == WM_UNICHAR)) { /* Not contigious */
+    int dead;
+
+    GetMessageW(&msg, aWnd, msg.message, msg.message);
+    dead = (msg.message == WM_DEADCHAR || msg.message == WM_SYSDEADCHAR);
+    if (is_dead)
+      *is_dead = (dead ? msg.wParam : -1);
+    if (dead)
+      continue;
+    code_unit = msg.wParam;
+    if (doubled) {                             /* had surrogate */
+      if (msg.message == WM_UNICHAR || code_unit < 0xDC00 || code_unit > 
0xDFFF) {
+        /* Mismatched first surrogate.  Pass both code units as if they were 
two characters. */
+        *buf++ = doubled;
+        if (!--buflen) // Drop the second char if at the end of the buffer
+          return i;
+      } else {
+        code_unit = (doubled << 10) + code_unit - 0x35FDC00;
+      }
+      doubled = 0;
+    } else if (code_unit >= 0xD800 && code_unit <= 0xDBFF) {
+      doubled = code_unit;
+      continue;
+    }    /* We handle mismatched second surrogate the same as a normal 
character. */
+    /* The only "fake" characters delivered by ToUnicode() or 
TranslateMessage() are: 
+       0x01 .. 0x1a for Control-chars, 
+       0x00 and 0x1b .. 0x1f for Control- []\@^_ 
+       0x7f for Control-BackSpace
+       0x20 for Control-Space */
+    if (ignore_ctrl && (code_unit < 0x20 || code_unit == 0x7f || (code_unit == 
0x20 && ctrl))) {
+      /* Non-character payload in a WM_CHAR (Ctrl-something pressed).  Ignore. 
*/
+      if (ctrl_cnt)
+        *ctrl_cnt++;
+      continue;
+    }
+    if (code_unit < 0x7f && 
+        ((vk >= VK_NUMPAD0 && vk <= VK_DIVIDE) ||
+         (exp && ((vk >= VK_PRIOR && vk <= VK_DOWN) || 
+                   vk == VK_INSERT || vk == VK_DELETE || vk == VK_CLEAR))) &&
+         strchr("0123456789/*-+.,", code_unit))        /* Traditionally, Emacs 
translates these to characters later, in `self-insert-character' */
+       continue;
+    *buf++ = code_unit;
+    buflen--;
+  }
+  return i - buflen;
+}
+
+int
+deliver_wm_chars (int do_translate, HWND hwnd, UINT msg, UINT wParam, UINT 
lParam)
+{
+  /* An "old style" keyboard description may assign up to 125 UTF-16 code 
points to a keypress. 
+     (However, the "old style" TranslateMessage() would deliver at most 16 of 
them.)  Be on a
+     safe side, and prepare to treat many more. */
+  int ctrl_cnt, buf[1024], count, is_dead;
+
+  if (do_translate) {
+      MSG windows_msg = { hwnd, msg, wParam, lParam, 0, {0,0} };
+
+      windows_msg.time = GetMessageTime ();
+      TranslateMessage (&windows_msg);
+  }
+  count = get_wm_chars (hwnd, buf, sizeof(buf)/sizeof(*buf), 1,
+                        /* The message may have been synthesized by who knows 
what; be conservative. */
+                        modifier_set (VK_LCONTROL) || modifier_set 
(VK_RCONTROL) || modifier_set (VK_CONTROL), 
+                        &ctrl_cnt, &is_dead, wParam, (lParam & 0x1000000L) != 
0);
+  if (count) {
+    W32Msg wmsg;
+    int *b = buf, strip_Alt = 1;
+
+    /* wParam is checked when converting CapsLock to Shift */
+    wmsg.dwModifiers = do_translate ? w32_get_key_modifiers (wParam, lParam) : 
0;
+
+    /* What follows is just heuristics; the correct treatement requires 
non-destructive ToUnicode(). */
+    if (wmsg.dwModifiers & ctrl_modifier)      /* If ctrl-something delivers 
chars, ctrl and the rest should be hidden */
+      wmsg.dwModifiers = wmsg.dwModifiers & shift_modifier;
+    /* In many keyboard layouts, (left) Alt is not changing the character.  
Unless we are in this situation, strip Alt/Meta. */
+    if (wmsg.dwModifiers & (alt_modifier | meta_modifier) &&   /* If 
alt-something delivers non-ASCIIchars, alt should be hidden */
+        count == 1 && *b < 0x10000) {
+      SHORT r = VkKeyScanW( *b );
+
+      fprintf(stderr, "VkKeyScanW %#06x %#04x\n", (int)r, wParam);
+      if ((r & 0xFF) == wParam && !(r & ~0x1FF)) {     /* Char available 
without Alt modifier, so Alt is "on top" */
+         if (*b > 0x7f && ('A' <= wParam && wParam <= 'Z'))
+           return 0;                                   /* Another branch below 
would convert it to Alt-Latin char via wParam */        
+         strip_Alt = 0;
+      }
+    }
+    if (strip_Alt)
+      wmsg.dwModifiers = wmsg.dwModifiers & ~(alt_modifier | meta_modifier);
+    
+    signal_user_input ();
+    while (count--)
+      {
+        fprintf(stderr, "unichar %#06x\n", *b);
+        my_post_msg (&wmsg, hwnd, WM_UNICHAR, *b++, lParam);
+      }
+    if (!ctrl_cnt)     /* Process ALSO as ctrl */
+      return 1;
+    else
+        fprintf(stderr, "extra ctrl char\n");
+    return -1;
+  } else if (is_dead >= 0) {
+      fprintf(stderr, "dead %#06x\n", is_dead);
+      return 1;
+  }
+  return 0;
+}
+
 /* Main window procedure */
 
 static LRESULT CALLBACK
@@ -3007,7 +3127,6 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA
       /* Synchronize modifiers with current keystroke.  */
       sync_modifiers ();
       record_keydown (wParam, lParam);
-      wParam = map_keypad_keys (wParam, (lParam & 0x1000000L) != 0);
 
       windows_translate = 0;
 
@@ -3117,6 +3236,45 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA
            wParam = VK_NUMLOCK;
          break;
        default:
+         if (w32_unicode_gui) {        
+           /* If this event generates characters or deadkeys, do not interpret 
+              it as a "raw combination of modifiers and keysym".  Hide  
+              deadkeys, and use the generated character(s) instead of the  
+              keysym.   (Backward compatibility: exceptions for numpad keys 
+              generating 0-9 . , / * - +, and for extra-Alt combined with a 
+              non-Latin char.) 
+              
+              Try to not report modifiers which have effect on which 
+              character or deadkey is generated.
+              
+              Example (contrived): if rightAlt-? generates f (on a Cyrillic 
+              keyboard layout), and Ctrl, leftAlt do not affect the generated
+              character, one wants to report Ctrl-leftAlt-f if the user 
+              presses Ctrl-leftAlt-rightAlt-?. */
+           int res; 
+#if 0
+           /* Some of WM_CHAR may be fed to us directly, some are results of 
+              TranslateMessage().  Using 0 as the first argument (in a 
+              separate call) might help us distinguish these two cases.
+
+              However, the keypress feeders would most probably expect the
+              "standard" message pump, when TranslateMessage() is called on 
+              EVERY KeyDown/Keyup event.  So they may feed us Down-Ctrl
+              Down-FAKE Char-o and expect us to recognize it as Ctrl-o.
+              Using 0 as the first argument would interfere with this.  */
+           deliver_wm_chars (0, hwnd, msg, wParam, lParam);
+#endif
+           /* Processing the generated WM_CHAR messages *WHILE* we handle 
+              KEYDOWN/UP event is the best choice, since withoug any fuss, 
+              we know all 3 of: scancode, virtual keycode, and expansion. 
+              (Additionally, one knows boundaries of expansion of different
+              keypresses.) */
+           res = deliver_wm_chars (1, hwnd, msg, wParam, lParam);
+           windows_translate = -( res != 0 );
+           if (res > 0)                /* Bound to character(s) or a deadkey */
+             break;
+         }                             /* Some branches after this one may be 
not needed */
+          wParam = map_keypad_keys (wParam, (lParam & 0x1000000L) != 0);
          /* If not defined as a function key, change it to a WM_CHAR message. 
*/
          if (wParam > 255 || !lispy_function_keys[wParam])
            {
@@ -3184,6 +3342,8 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA
            }
        }
 
+    if (windows_translate == -1)
+      break;
     translate:
       if (windows_translate)
        {


=======================================================



In GNU Emacs 25.0.50.20 (i686-pc-mingw32)
 of 2015-02-08 on BUCEFAL
Repository revision: d5e3922e08587e7eb9e5aec2e9f84cbda405f857
Windowing system distributor `Microsoft Corp.', version 6.1.7601
Configured using:
 `configure --prefix=/k/test'

Configured features:
SOUND NOTIFY ACL

Important settings:
  value of $LANG: ENU
  locale-coding-system: cp1252

Major mode: Fundamental

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  buffer-read-only: t
  line-number-mode: t

Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.

Load-path shadows:
None found.

Features:
(shadow sort gnus-util mail-extr emacsbug message dired format-spec
rfc822 mml easymenu mml-sec mm-decode mm-bodies mm-encode mail-parse
rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045
ietf-drums mm-util help-fns mail-prsvr mail-utils time-date tooltip
eldoc electric uniquify ediff-hook vc-hooks lisp-float-type mwheel
dos-w32 ls-lisp disp-table w32-win w32-vars tool-bar dnd fontset image
regexp-opt fringe tabulated-list newcomment elisp-mode lisp-mode
prog-mode register page menu-bar rfn-eshadow timer select scroll-bar
mouse jit-lock font-lock syntax facemenu font-core frame cham georgian
utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean
japanese hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese case-table epa-hook jka-cmpr-hook help simple abbrev
minibuffer cl-preloaded nadvice loaddefs button faces cus-face macroexp
files text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote make-network-process
w32notify w32 multi-tty emacs)

Memory information:
((conses 8 80324 9864)
 (symbols 32 17968 0)
 (miscs 32 85 128)
 (strings 16 12688 4007)
 (string-bytes 1 324435)
 (vectors 8 9470)
 (vector-slots 4 390690 6074)
 (floats 8 65 62)
 (intervals 28 243 45)
 (buffers 516 13))





reply via email to

[Prev in Thread] Current Thread [Next in Thread]