[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order ma
From: |
Eli Zaretskii |
Subject: |
bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark |
Date: |
Sat, 08 Aug 2009 15:20:10 +0300 |
> From: "Pierre Bogossian" <bogossian@mail.com>
> Date: Fri, 7 Aug 2009 09:50:54 +0100
>
> >[...] does it help to say
> >"C-x RET f utf-8-with-signature RET" before entering hexl-mode?
>
> No, but forcing the coding system of any buffer to utf_8-with-signature
> using this command and then entering hexl-mode is enough to trigger
> the error. I can even reproduce it with a blank scratch buffer.
>
> >> Unfortunately I can't test a unix version at the moment.
> >
> >Which means your OS is what?
>
> Windows XP SP3.
The problem happens on GNU/Linux as well.
I think I've identified why the problem happens, but I need help in
finding the right solution. Handa-san, can you please comment on
what's below? Of course, others are welcome to comment as well.
The cause of the problem is this: hexlify-buffer must bind
coding-system-for-write to the buffer's encoding, to force
call-process-region use the buffer's encoding when writing the text to
the temporary file. OTOH, it needs to avoid encoding the arguments
passed to the `hexl' program by the buffer's encoding, because that
could be inappropriate for encoding command lines on the underlying
system. However, call-process-region normally uses
coding-system-for-write, if it is non-nil, to encode the arguments as
well. To resolve this contradiction, hexlify-buffer encodes the
arguments manually (by locale-coding-system), assuming that, being
unibyte strings after that encoding, they will not be encoded by
call-process-region.
But call-process (called by call-process-region) does this:
/* If arguments are supplied, we may have to encode them. */
if (nargs >= 5)
{
int must_encode = 0;
Lisp_Object coding_attrs;
for (i = 4; i < nargs; i++)
CHECK_STRING (args[i]);
for (i = 4; i < nargs; i++)
if (STRING_MULTIBYTE (args[i]))
must_encode = 1;
if (!NILP (Vcoding_system_for_write))
val = Vcoding_system_for_write;
else if (! must_encode)
val = Qnil;
else
{
args2 = (Lisp_Object *) alloca ((nargs + 1) * sizeof *args2);
args2[0] = Qcall_process;
for (i = 0; i < nargs; i++) args2[i + 1] = args[i];
coding_systems = Ffind_operation_coding_system (nargs + 1, args2);
First, if coding-system-for-write is non-nil, it is used, even if none
of the argument strings is a multibyte string. (This particular bug
can easily be solved by making the test for must_encode before we test
that coding-system-for-write is non-nil, but I'm not sure this is the
right solution because other arguments could be multibyte strings,
which will still cause us to use coding-system-for-write for _all_
arguments.)
And second, this fragment, which actually encodes the arguments,
further down in call-process:
if (nargs > 4)
{
register int i;
struct gcpro gcpro1, gcpro2, gcpro3, gcpro4, gcpro5;
GCPRO5 (infile, buffer, current_dir, path, error_file);
argument_coding.dst_multibyte = 0;
for (i = 4; i < nargs; i++)
{
argument_coding.src_multibyte = STRING_MULTIBYTE (args[i]);
if (CODING_REQUIRE_ENCODING (&argument_coding))
/* We must encode this argument. */
args[i] = encode_coding_string (&argument_coding, args[i], 1);
}
encodes the argument even though argument_coding.src_multibyte is set
to nil. Is encode_coding_string supposed to encode unibyte strings?
- bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark, Pierre Bogossian, 2009/08/06
- bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark, Pierre Bogossian, 2009/08/07
- bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark,
Eli Zaretskii <=
- bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark, Eli Zaretskii, 2009/08/08
- bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark, Andreas Schwab, 2009/08/08
- bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark, Eli Zaretskii, 2009/08/08
- bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark, Lennart Borgman, 2009/08/08
- bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark, Eli Zaretskii, 2009/08/08
- bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark, Andreas Schwab, 2009/08/08
- bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark, Eli Zaretskii, 2009/08/08
- bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark, Lennart Borgman, 2009/08/08
- bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark, Stefan Monnier, 2009/08/10
- bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark, Kenichi Handa, 2009/08/10