bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#19994: 25.0.50; Unicode keyboard input on Windows


From: Ilya Zakharevich
Subject: bug#19994: 25.0.50; Unicode keyboard input on Windows
Date: Wed, 1 Jul 2015 03:07:12 -0700
User-agent: Mutt/1.5.21 (2010-09-15)

On Wed, Mar 04, 2015 at 08:01:01PM +0200, Eli Zaretskii wrote:
> > Date: Tue, 3 Mar 2015 15:09:49 -0800
> > From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
> > 
> > I’m working on a patch to make Unicode keyboard input to work properly on
> > Windows (in graphic mode).

> I suggest, indeed, to clean up the code so we could commit it to the
> master branch.  That way, it will get wider testing, and we can fix
> whatever problems it might cause.  Any deficiencies that don't cause
> regressions wrt the current code can be fixed later, or even not at
> all (if we decide them to not be important enough).

I had no time to work on the code itself, but
  • I fixed the formatting,
  • I pumped up the docs,
  • I put in the suggested eassert().

----------------

As it was before, the patch
  • defines two new static functions,
  • delays modification of wParam as late as needed (moves 1 LoC in
    w32_wnd_proc()), and
  • adds 8 LoC to w32_wnd_proc().
The call to these static functions is conditional on w32_unicode_gui.

Enjoy,
Ilya

--- w32fns.c-ini        2015-01-30 15:33:23.505201400 -0800
+++ w32fns.c    2015-07-01 02:56:30.787672000 -0700
@@ -2832,6 +2832,233 @@ post_character_message (HWND hwnd, UINT
   my_post_msg (&wmsg, hwnd, msg, wParam, lParam);
 }
 
+static int
+get_wm_chars (HWND aWnd, int *buf, int buflen, int ignore_ctrl, int ctrl, 
+              int *ctrl_cnt, int *is_dead, int vk, int exp)
+{
+  MSG msg;
+  /* If doubled is at the end, ignore it */
+  int i = buflen, doubled = 0, code_unit;
+
+  if (ctrl_cnt)
+    *ctrl_cnt = 0;
+  if (is_dead)
+    *is_dead = -1;
+  eassert(w32_unicode_gui);
+  while (buflen
+        /* Should be called only when w32_unicode_gui: */
+         && PeekMessageW(&msg, aWnd, WM_KEYFIRST, WM_KEYLAST, 
+                     PM_NOREMOVE | PM_NOYIELD)
+         && (msg.message == WM_CHAR || msg.message == WM_SYSCHAR 
+             || msg.message == WM_DEADCHAR || msg.message == WM_SYSDEADCHAR 
+             || msg.message == WM_UNICHAR)) 
+    { 
+      /* We extract character payload, but in this call we handle only the 
+         characters which comes BEFORE the next keyup/keydown message. */
+      int dead;
+
+      GetMessageW(&msg, aWnd, msg.message, msg.message);
+      dead = (msg.message == WM_DEADCHAR || msg.message == WM_SYSDEADCHAR);
+      if (is_dead)
+        *is_dead = (dead ? msg.wParam : -1);
+      if (dead)
+        continue;
+      code_unit = msg.wParam;
+      if (doubled) 
+        { 
+          /* had surrogate */
+          if (msg.message == WM_UNICHAR 
+              || code_unit < 0xDC00 || code_unit > 0xDFFF) 
+            { /* Mismatched first surrogate.  
+                 Pass both code units as if they were two characters. */
+              *buf++ = doubled;
+              if (!--buflen)
+                return i; /* Drop the 2nd char if at the end of the buffer. */
+            } 
+          else /* see https://en.wikipedia.org/wiki/UTF-16 */
+            {
+              code_unit = (doubled << 10) + code_unit - 0x35FDC00;
+            }
+          doubled = 0;
+        } 
+      else if (code_unit >= 0xD800 && code_unit <= 0xDBFF) 
+        {    
+          /* Handle mismatched 2nd surrogate the same as a normal character. */
+          doubled = code_unit;
+          continue;
+        }
+
+      /* The only "fake" characters delivered by ToUnicode() or 
+         TranslateMessage() are: 
+         0x01 .. 0x1a for Ctrl-letter, Enter, Tab, Ctrl-Break, Esc, Backspace
+         0x00 and 0x1b .. 0x1f for Control- []\@^_ 
+         0x7f for Control-BackSpace
+         0x20 for Control-Space */
+      if (ignore_ctrl 
+          && (code_unit < 0x20 || code_unit == 0x7f 
+              || (code_unit == 0x20 && ctrl))) 
+        { 
+          /* Non-character payload in a WM_CHAR
+             (Ctrl-something pressed, see above).  Ignore, and report. */
+          if (ctrl_cnt)
+            *ctrl_cnt++;
+          continue;
+        }
+      /* Traditionally, Emacs would ignore the character payload of VK_NUMPAD* 
+         keys, and would treat them later via `function-key-map'.  In addition
+         to usual 102-key NUMPAD keys, this map also treats `kp-'-variants of
+         space, tab, enter, separator, equal.  TAB  and EQUAL, apparently, 
+         cannot be generated on Win-GUI branch.  ENTER is already handled 
+         by the code above.  According to `lispy_function_keys', kp_space is 
+         generated by not-extended VK_CLEAR.  (kp-tab !=  VK_OEM_NEC_EQUAL!). 
+       
+         We do similarly for backward-compatibility, but ignore only the
+         characters restorable later by `function-key-map'. */
+      if (code_unit < 0x7f 
+          && ((vk >= VK_NUMPAD0 && vk <= VK_DIVIDE) 
+              || (exp && ((vk >= VK_PRIOR && vk <= VK_DOWN) || 
+                     vk == VK_INSERT || vk == VK_DELETE || vk == VK_CLEAR))) 
+          && strchr("0123456789/*-+.,", code_unit))
+        continue;
+      *buf++ = code_unit;
+      buflen--;
+    }
+  return i - buflen;
+}
+
+#ifdef DBG_WM_CHARS
+#  define FPRINTF_WM_CHARS(ARG)        fprintf ARG
+#else
+#  define FPRINTF_WM_CHARS(ARG)        0
+#endif
+
+int
+deliver_wm_chars (int do_translate, HWND hwnd, UINT msg, UINT wParam, 
+                  UINT lParam, int legacy_alt_meta)
+{
+  /* An "old style" keyboard description may assign up to 125 UTF-16 code 
+     points to a keypress. 
+     (However, the "old style" TranslateMessage() would deliver at most 16 of 
+     them.)  Be on a safe side, and prepare to treat many more. */
+  int ctrl_cnt, buf[1024], count, is_dead;
+
+  /* Since the keypress processing logic of Windows has a lot of state, it 
+     is important to call TranslateMessage() for every keyup/keydown, AND
+     do it exactly once.  (The actual change of state is done by
+     ToUnicode[Ex](), which is called by TranslateMessage().  So one can
+     call ToUnicode[Ex]() instead.)
+     
+     The "usual" message pump calls TranslateMessage() for EVERY event.
+     Emacs calls TranslateMessage() very selectively (is it needed for doing 
+     some tricky stuff with Win95???  With newer Windows, selectiveness is,
+     most probably, not needed - and harms a lot). 
+     
+     So, with the usual message pump, the following call to TranslateMessage() 
+     is not needed (and is going to be VERY harmful).  With Emacs' message 
+     pump, the call is needed.  */
+  if (do_translate) {
+      MSG windows_msg = { hwnd, msg, wParam, lParam, 0, {0,0} };
+
+      windows_msg.time = GetMessageTime ();
+      TranslateMessage (&windows_msg);
+  }
+  count = get_wm_chars (hwnd, buf, sizeof(buf)/sizeof(*buf), 1,
+                        /* The message may have been synthesized by 
+                           who knows what; be conservative. */
+                        modifier_set (VK_LCONTROL) 
+                          || modifier_set (VK_RCONTROL) 
+                          || modifier_set (VK_CONTROL), 
+                        &ctrl_cnt, &is_dead, wParam, 
+                        (lParam & 0x1000000L) != 0);
+  if (count) {
+    W32Msg wmsg;
+    int *b = buf, strip_Alt = 1;
+
+    /* wParam is checked when converting CapsLock to Shift */
+    wmsg.dwModifiers = do_translate 
+       ? w32_get_key_modifiers (wParam, lParam) : 0;
+
+    /* What follows is just heuristics; the correct treatement requires 
+       non-destructive ToUnicode(): 
+         
http://search.cpan.org/~ilyaz/UI-KeyboardLayout/lib/UI/KeyboardLayout.pm#Can_an_application_on_Windows_accept_keyboard_events?_Part_IV:_application-specific_modifiers
+
+       What one needs to find is: 
+         * which of the present modifiers AFFECT the resulting char(s) 
+           (so should be stripped, since their EFFECT is "already
+            taken into account" in the string in buf), and 
+         * which modifiers are not affecting buf, so should be reported to
+           the application for further treatment.
+       
+       Example: assume that we know:
+         (A) lCtrl+rCtrl+rAlt modifiers with VK_A key produce a Latin "f"
+             ("may be logical" with a JCUKEN-flavored Russian keyboard flavor);
+         (B) removing any one of lCtrl, rCtrl, rAlt changes the produced char;
+         (C) Win-modifier is not affecting the produced character 
+             (this is the common case: happens with all "standard" layouts).
+
+       Suppose the user presses Win+lCtrl+rCtrl+rAlt modifiers with VK_A.
+       What is the intent of the user?  We need to guess the intent to decide  
+       which event to deliver to the application.
+       
+       This looks like a reasonable logic: wince Win- modifier does not affect 
+       the output string, the user was pressing Win for SOME OTHER purpose.
+       So the user wanted to generate Win-SOMETHING event.  Now, what is
+       something?  If one takes the mantra that "character payload is more 
+       important than the combination of keypresses which resulted in this 
+       payload", then one should ignore lCtrl+rCtrl+rAlt, ignore VK_A, and
+       assume that the user wanted to generate Win-f.
+       
+       Unfortunately, without non-destructive ToUnicode(), checking (B) and (C)
+       is out of question.  So we use heuristics (hopefully, covering 99.9999%
+       of cases).
+     */
+    
+    /* If ctrl-something delivers chars, ctrl and the rest should be hidden; 
+       so the consumer of key-event won't interpret it as an accelerator. */
+    if (wmsg.dwModifiers & ctrl_modifier)
+      wmsg.dwModifiers = wmsg.dwModifiers & shift_modifier;
+    /* In many keyboard layouts, (left) Alt is not changing the character.  
+       Unless we are in this situation, strip Alt/Meta. */
+    if (wmsg.dwModifiers & (alt_modifier | meta_modifier) 
+        /* If alt-something delivers non-ASCIIchars, alt should be hidden */
+        && count == 1 && *b < 0x10000) 
+      {
+        SHORT r = VkKeyScanW( *b );
+
+        FPRINTF_WM_CHARS((stderr, "VkKeyScanW %#06x %#04x\n", (int)r, wParam));
+        if ((r & 0xFF) == wParam && !(r & ~0x1FF)) 
+          {    
+            /* Char available without Alt modifier, so Alt is "on top" */
+            if (legacy_alt_meta 
+                && *b > 0x7f && ('A' <= wParam && wParam <= 'Z'))
+             /* For backward-compatibility with older Emacsen, let
+                this be processed by another branch below (which would convert 
+                it to Alt-Latin char via wParam). */
+              return 0;
+            strip_Alt = 0;
+          }
+      }
+    if (strip_Alt)
+      wmsg.dwModifiers = wmsg.dwModifiers & ~(alt_modifier | meta_modifier);
+    
+    signal_user_input ();
+    while (count--)
+      {
+        FPRINTF_WM_CHARS((stderr, "unichar %#06x\n", *b));
+        my_post_msg (&wmsg, hwnd, WM_UNICHAR, *b++, lParam);
+      }
+    if (!ctrl_cnt) /* Process ALSO as ctrl */
+      return 1;
+    else
+        FPRINTF_WM_CHARS((stderr, "extra ctrl char\n"));
+    return -1;
+  } else if (is_dead >= 0) {
+      FPRINTF_WM_CHARS((stderr, "dead %#06x\n", is_dead));
+      return 1;
+  }
+  return 0;
+}
+
 /* Main window procedure */
 
 static LRESULT CALLBACK
@@ -3007,7 +3234,6 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA
       /* Synchronize modifiers with current keystroke.  */
       sync_modifiers ();
       record_keydown (wParam, lParam);
-      wParam = map_keypad_keys (wParam, (lParam & 0x1000000L) != 0);
 
       windows_translate = 0;
 
@@ -3117,6 +3343,46 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA
            wParam = VK_NUMLOCK;
          break;
        default:
+         if (w32_unicode_gui) {        
+           /* If this event generates characters or deadkeys, do not interpret 
+              it as a "raw combination of modifiers and keysym".  Hide  
+              deadkeys, and use the generated character(s) instead of the  
+              keysym.   (Backward compatibility: exceptions for numpad keys 
+              generating 0-9 . , / * - +, and for extra-Alt combined with a 
+              non-Latin char.) 
+              
+              Try to not report modifiers which have effect on which 
+              character or deadkey is generated.
+              
+              Example (contrived): if rightAlt-? generates f (on a Cyrillic 
+              keyboard layout), and Ctrl, leftAlt do not affect the generated
+              character, one wants to report Ctrl-leftAlt-f if the user 
+              presses Ctrl-leftAlt-rightAlt-?. */
+           int res; 
+#if 0
+           /* Some of WM_CHAR may be fed to us directly, some are results of 
+              TranslateMessage().  Using 0 as the first argument (in a 
+              separate call) might help us distinguish these two cases.
+
+              However, the keypress feeders would most probably expect the
+              "standard" message pump, when TranslateMessage() is called on 
+              EVERY KeyDown/Keyup event.  So they may feed us Down-Ctrl
+              Down-FAKE Char-o and expect us to recognize it as Ctrl-o.
+              Using 0 as the first argument would interfere with this.  */
+           deliver_wm_chars (0, hwnd, msg, wParam, lParam, 1);
+#endif
+           /* Processing the generated WM_CHAR messages *WHILE* we handle 
+              KEYDOWN/UP event is the best choice, since withoug any fuss, 
+              we know all 3 of: scancode, virtual keycode, and expansion. 
+              (Additionally, one knows boundaries of expansion of different
+              keypresses.) */
+           res = deliver_wm_chars (1, hwnd, msg, wParam, lParam, 1);
+           windows_translate = -( res != 0 );
+           if (res > 0) /* Bound to character(s) or a deadkey */
+             break;
+           /* deliver_wm_chars() may make some branches after this vestigal */
+         }
+          wParam = map_keypad_keys (wParam, (lParam & 0x1000000L) != 0);
          /* If not defined as a function key, change it to a WM_CHAR message. 
*/
          if (wParam > 255 || !lispy_function_keys[wParam])
            {
@@ -3184,6 +3450,8 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA
            }
        }
 
+    if (windows_translate == -1)
+      break;
     translate:
       if (windows_translate)
        {





reply via email to

[Prev in Thread] Current Thread [Next in Thread]