bug#3616: 23.0.94; vc-bzr coding system bug

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#3616: 23.0.94; vc-bzr coding system bug

From:	Ryan Duan
Subject:	bug#3616: 23.0.94; vc-bzr coding system bug
Date:	Mon, 22 Jun 2009 10:01:51 +0800

It works from the command line which is part of Windows XP and uses
Windows ANSI coding system.  Windows command line seems use cp936 as
the coding system.
The value of buffer-file-coding-system in the *shell* buffer is
chinese-gbk-dos, one of whose alias is cp936-dos.  It doesn't help to
change it to any of cp936 nor chinese-iso-8bit.

I observe that *shell* and *VC-log* buffers pass UTF-8 encoded string
(Is Emacs's internal buffer code UTF-8?) to Windows command line,
which might be the real cause of this bug and other related bugs.
Three examples following.

EXAMPLE 1
--------------------------------
In *shell*,
d:\code>bzr commit -m "第二"
bzr commit -m "绗 ��"
Traceback (most recent call last):
 File "bzr", line 130, in <module>
 File "bzrlib\commands.pyo", line 969, in main
bzrlib.errors.BzrError: Parameter ''\xe7\xac\xac\xe4\xba\x8c'' is
unsupported by the current encoding.

Notice ''\xe7\xac\xac\xe4\xba\x8c'' which is the UTF-8 encoding of my
inputted Chinese characters.  It was these UTF-8 string causing the
above error.

Apply C-u C-x = on the Chinese character "第":
       character: 第 (31532, #o75454, #x7b2c)
preferred charset: chinese-gbk (GBK Chinese simplified.)
      code point: 0xB5DA
          syntax: w    which means: word
        category:
                  .:Base, C:2-byte han, c:Chinese, h:Korean,
j:Japanese, |:line breakable
     buffer code: #xE7 #xAC #xAC
       file code: #xB5 #xDA (encoded by coding system chinese-gbk-dos)
         display: by this font (glyph code)
   uniscribe:-outline-新宋体-normal-normal-normal-mono-13-*-*-*-c-*-gb2312.1980-0
(#x3100)

Notice its buffer code is "\xe7\xac\xac" which is the first substring
of ''\xe7\xac\xac\xe4\xba\x8c''.  The file code "\xb5\xda" is
chinse-gbk encoded, and is what I expect to pass to the command line
system in Windows, which might work correctly.  But unfortunately,
instead of passing Chinese GBK encoded string to SHELL, Emacs passes
UTF-8 encoded string to SHELL.

EXAMPLE 2
--------------------------------
In *VC-log* buffer, I inputted two Chinese characters "第二" which was
the same as that in EXAMPLE 1.
After C-c C-c, the same error occurs: bzrlib.errors.BzrError:
Parameter ''\xe7\xac\xac\xe4\xba\x8c'' is unsupported by the current
encoding.
Apply C-u C-x = on "第" returned the same information as that in EXAMPLE 1.

EXAMPLE 3 (Another related bug)
--------------------------------
In Windows, I created a directory (folder) named "第二".
In dired, it works all right.
But in *shell*,
d:\>cd 第二
cd 绗 ��
系统找不到指定的路径。

It complains that the system cannot find the specified path.  Because
the "\xb5\xda\xb6\xfe"(Chinese GBK) is converted to
''\xe7\xac\xac\xe4\xba\x8c''(UTF-8) to pass to the SHELL, but the
SHELL can only process Chinese GBK characters.

CONCLUSION
--------------------------------
When we use Emacs on Chinese Windows, Chinese GBK characters are
converted to UTF-8 characters to pass to Windows command line, but
Windows command line cannot process UTF-8 characters, which causes
this bug and other related bugs.

I feel that this is not a small problem.  Emacs should detect the OS's
locale, then use the correct encoding system to interact with the OS.
It seems to do well on Linux but badly on Windows.  Dired seems do
well on Windows but shell.el and vc-bzr.el do badly.  I didn't test
other vc-* modes.

I hope the information above will help solve this problem.  Thank you!
HAPPY HACKING!

2009/6/19 Eli Zaretskii <eliz@gnu.org>:
>> Date: Fri, 19 Jun 2009 16:24:37 +0800
>> From: =?UTF-8?Q?=E7=AB=AF=E7=91=9E?= <duanpanda@gmail.com>
>> Cc:
>> Reply-To: =?UTF-8?Q?=E7=AB=AF=E7=91=9E?= <duanpanda@gmail.com>,
>>       3616@emacsbugs.donarmstrong.com

> Does it work for you from the command line?  If it does, what encoding
> of Chinese do you use in that case?
>
> What is the value of buffer-file-coding-system in the *shell* buffer?
> Does it help to change it to cp936?

[Prev in Thread]

Current Thread

[Next in Thread]

bug#3616: 23.0.94; vc-bzr coding system bug, 端瑞, 2009/06/19
- bug#3616: 23.0.94; vc-bzr coding system bug, Eli Zaretskii, 2009/06/19
  - Message not available
    - bug#3616: 23.0.94; vc-bzr coding system bug, Ryan Duan <=
    - bug#3616: 23.0.94; vc-bzr coding system bug, Andreas Schwab, 2009/06/22
    - bug#3616: 23.0.94; vc-bzr coding system bug, Ryan Duan, 2009/06/22

Prev by Date: bug#3646: 23.0.95; bookmark format upgrade is incorrect
Next by Date: bug#1220: marked as done (overlay before- and after-string and mouse-face)
Previous by thread: bug#3616: 23.0.94; vc-bzr coding system bug
Next by thread: bug#3616: 23.0.94; vc-bzr coding system bug
Index(es):
- Date
- Thread