bug#14368: 24.3.50; Big screw: multibyte characters become unibyte

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#14368: 24.3.50; Big screw: multibyte characters become unibyte

From:	Eli Zaretskii
Subject:	bug#14368: 24.3.50; Big screw: multibyte characters become unibyte
Date:	Sun, 12 May 2013 19:04:30 +0300

> Date: Sun, 12 May 2013 05:51:35 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 14368@debbugs.gnu.org
> 
> > Date: Sat, 11 May 2013 17:44:27 -0400
> > From: Richard Stallman <rms@gnu.org>
> > CC: 14368@debbugs.gnu.org
> > 
> >     Can you reproduce it starting with "emacs -Q"?
> > 
> > Yes.  I type
> > 
> > emacs -Q
> > C-\ latin-1-postfix RET
> > a ' C-a
> > 
> > and it fails
> 
> It doesn't fail for me, with yesterday's trunk.  C-a just moves to the
> beginning of the line, as expected.
> 
> Wait, I can reproduce this in a TTY session (the above was a GUI
> session).  I will try to look into it.

I found the reason, but I don't know enough about quail or input
decoding to suggest a solution.

The reason seems to be this changeset:

  112000: Stefan Monnier 2013-03-11 * src/keyboard.c: Move keyboard decoding to 
read_key_sequence.

The problem is that we now decode all input that comes from quail
(read_char calls input-method-function, and then read_decoded_char
decodes the result).

However, quail seems to work by deleting some characters from the
buffer, and then reinserting them, possibly after translation, as
instructed by the additional characters you type.  In this case,
typing "a '" inserts á, and quail then waits for another character.
Typing C-a at this point removes á from the buffer, and then sends as
input 2 events: a self-inserting character whose code is 225 decimal
(that's á), followed by the code 1, which is C-a.  (I don't know if
this is how quail is supposed to work; what I described is what I saw
in the debugger.  Perhaps Handa-san could comment on that.)

What happens next is that read_decoded_char attempts to decode 225,
which will cause different results depending on the current keyboard
encoding: on GNU/Linux, we get an 8-bit raw byte \341 (that's octal
for 225), while on Windows with cp862 as the keyboard encoding, I get
ß.  C-a is executed as expected, but the net result is that á was
replaced by something else.

I'm not sure how to fix this cleanly.  One way would be to get quail
to encode the character events it sends, but then we have problems
with un-encodable characters.  Another way would be to somehow detect
that the character comes from quail and refrain from decoding it,
although I always thought that one of the goals of revision 112000 was
precisely to _allow_ decoding characters coming from quail.

Stefan, can you take a look, please?

[Prev in Thread]

Current Thread

[Next in Thread]

bug#14368: 24.3.50; Big screw: multibyte characters become unibyte, Richard Stallman, 2013/05/08
- bug#14368: 24.3.50; Big screw: multibyte characters become unibyte, Richard Stallman, 2013/05/11
  - bug#14368: 24.3.50; Big screw: multibyte characters become unibyte, Eli Zaretskii, 2013/05/11
    - bug#14368: 24.3.50; Big screw: multibyte characters become unibyte, Richard Stallman, 2013/05/11
    - bug#14368: 24.3.50; Big screw: multibyte characters become unibyte, Eli Zaretskii, 2013/05/11
    - bug#14368: 24.3.50; Big screw: multibyte characters become unibyte, Eli Zaretskii <=
    - bug#14368: 24.3.50; Big screw: multibyte characters become unibyte, Stefan Monnier, 2013/05/13
    - bug#14368: 24.3.50; Big screw: multibyte characters become unibyte, Stefan Monnier, 2013/05/23
    - bug#14368: 24.3.50; Big screw: multibyte characters become unibyte, Eli Zaretskii, 2013/05/23
- bug#14368: 24.3.50; Big screw: multibyte characters become unibyte, Handa Kenichi, 2013/05/24
  - bug#14368: 24.3.50; Big screw: multibyte characters become unibyte, Eli Zaretskii, 2013/05/24
    - bug#14368: 24.3.50; Big screw: multibyte characters become, Handa Kenichi, 2013/05/24
    - bug#14368: 24.3.50; Big screw: multibyte characters become, Richard Stallman, 2013/05/25
    - bug#14368: 24.3.50; Big screw: multibyte characters become, Stefan Monnier, 2013/05/25
  - bug#14368: 24.3.50; Big screw: multibyte characters become unibyte, Stefan Monnier, 2013/05/24

Prev by Date: bug#14381: 24.3; smie-auto-fill infinite loop
Next by Date: bug#14392: 24.3.50; VHDL mode rebinds DEL and M-DEL
Previous by thread: bug#14368: 24.3.50; Big screw: multibyte characters become unibyte
Next by thread: bug#14368: 24.3.50; Big screw: multibyte characters become unibyte
Index(es):
- Date
- Thread