emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fcall_process: wrong conversion


From: Herbert Euler
Subject: Re: Fcall_process: wrong conversion
Date: Mon, 15 May 2006 23:17:06 +0800

I followed these steps:

   - Create a file contains UTF-16 text, either UTF-16BE or UTF-16LE
     is OK.  For example, create a file contains "a" in UTF-16LE as
     its content and name this file with "1".

   - Visit file "1" with C-x C-f.

In fact, files in UTF-16 can be interpreted as UTF-16 text, or ASCII
text with non-ASCII characters.  The UTF-16LE representation of
content of file "1" is "a", and the ASCII representation is
"\377\376a^@", where "\377\376" means the text is in UTF-16LE
encoding, and in which "a" is represented as "a^@" (^@ is \0 here).
If for some reason Emacs doesn't visit the file with correct encoding,
one can type C-x RET r followed by the correct encoding and RET to
correct it.

   - In case the buffer is encoded with raw-text-unix, the content is
     displayed as "\377\376a^@".  Type M-x hexl-mode RET, correct
     result is displayed (no description here, since it's easy to
     get).

   - In case the buffer is encoded with utf-16-le, the content is
     displayed as "a".  Type M-x hexl-mode RET, the result is

         \377?: Invalid argument

     displayed in the buffer.

This is because hexl-mode finishes its job as follows:

   1. Store the buffer content in a temporary file.

   2. Invoke "hexl" with argument "-hex" and stdin set to the
      temporary file, and put its output into the same buffer.  This
      is done by calling `call-process-region' (and so
      `call-process').

   3. Manipulate the output to generate correct result.

When the buffer is encoded with raw-text-unix, the code of
`Fcall_process' in callproc.c shown in the last mail will not convert
the argument "-hex", so the actual command to be invoked is "hexl
-hex".  But if the buffer is encoded with utf-16-le, "-hex" will be
converted to "address@hidden@address@hidden@", so the command to be invoked is
"hexl address@hidden@address@hidden@".  Since "^@" is actually '\0', "hexl"
would see "\377\376-" as its first argument.  That's why the content
displayed in the second case is an error message.  The following code
of hexl-mode can't manipulate the (wrong) output correctly as a
result.

Hope I've described clearly.

Regards,
Guanpeng Xu


From: Stefan Monnier <address@hidden>
To: "Herbert Euler" <address@hidden>
CC: address@hidden
Subject: Re: Fcall_process: wrong conversion
Date: Mon, 15 May 2006 10:25:27 -0400

> Fcall_process in callproc.c, which is correspond to `call-process',
> cannot handle UTF-16 (both LE or BE) correctly.  Take a look at line

Actually, it handles it just fine.  The problem is that call-process and
start-process both use the same coding system to encode arguments and to
encode the data sent via stdin to the process, whereas you want them to
be distinct.
If you want them to be distinct, then you need to manually encode your
arguments before passing them to call-process.

I.e. the bug with hexl-mode is in hexl.el.  Please report it separately
indicating how to reproduce the problem (I don't know how to "applying
`hexl-mode' to UTF-16 texts").


        Stefan

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/





reply via email to

[Prev in Thread] Current Thread [Next in Thread]