emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fcall_process: wrong conversion


From: Herbert Euler
Subject: Re: Fcall_process: wrong conversion
Date: Tue, 16 May 2006 10:59:10 +0800

This doesn't work.  I've followed the code, seems the reason is as
follows.

You changed the code in hexl.el to:

 (let ((coding-system-for-read 'raw-text)
       (coding-system-for-write buffer-file-coding-system)
       (buffer-undo-list t))
   (apply 'call-process-region (point-min) (point-max)
          (expand-file-name hexl-program exec-directory)
          t t nil
          ;; Manually encode the args, otherwise they're encoded using
          ;; coding-system-for-write (i.e. buffer-file-coding-system) which
          ;; may not be what we want (e.g. utf-16 on a non-utf-16 system).
(mapcar (lambda (s) (encode-coding-string s locale-coding-system))
                  (split-string hexl-options)))

So when invoking call-process, the value of `coding-system-for-write'
is not nil.  In my test, it is `utf-16le-with-signature'.  The
coding-decide part in callproc.c is line 269 to 300:

   if (nargs >= 5)
     {
       int must_encode = 0;

       for (i = 4; i < nargs; i++)
         CHECK_STRING (args[i]);

       for (i = 4; i < nargs; i++)
         if (STRING_MULTIBYTE (args[i]))
           must_encode = 1;

       if (!NILP (Vcoding_system_for_write))
         val = Vcoding_system_for_write;
       else if (! must_encode)
         val = Qnil;
       else
         {
           args2 = (Lisp_Object *) alloca ((nargs + 1) * sizeof *args2);
           args2[0] = Qcall_process;
           for (i = 0; i < nargs; i++) args2[i + 1] = args[i];
coding_systems = Ffind_operation_coding_system (nargs + 1, args2);
           if (CONSP (coding_systems))
             val = XCDR (coding_systems);
           else if (CONSP (Vdefault_process_coding_system))
             val = XCDR (Vdefault_process_coding_system);
           else
             val = Qnil;
         }
       val = coding_inherit_eol_type (val, Qnil);
       setup_coding_system (Fcheck_coding_system (val), &argument_coding);
     }
 }

If `Vcoding_system_for_write' is not nil, `val' will be set to that
value.  So at the last line of this code, `detector', `decoder', and
`encoder' field of `argument_coding' will be set to UTF-16 relative
ones, and CODING_REQUIRE_ENCODING_MASK flag is turned on for
`common_flags' of `argument_coding' in coding.c, line 5042 to 5059:

 else if (EQ (coding_type, Qutf_16))
   {
     val = AREF (attrs, coding_attr_utf_16_bom);
     CODING_UTF_16_BOM (coding) = (CONSP (val) ? utf_16_detect_bom
                                   : EQ (val, Qt) ? utf_16_with_bom
                                   : utf_16_without_bom);
     val = AREF (attrs, coding_attr_utf_16_endian);
     CODING_UTF_16_ENDIAN (coding) = (EQ (val, Qbig) ? utf_16_big_endian
                                      : utf_16_little_endian);
     CODING_UTF_16_SURROGATE (coding) = 0;
     coding->detector = detect_coding_utf_16;
     coding->decoder = decode_coding_utf_16;
     coding->encoder = encode_coding_utf_16;
     coding->common_flags
       |= (CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK);
     if (CODING_UTF_16_BOM (coding) == utf_16_detect_bom)
       coding->common_flags |= CODING_REQUIRE_DETECTION_MASK;
   }

Go back to line 410 to 427, callproc.c:

 if (nargs > 4)
   {
     register int i;
     struct gcpro gcpro1, gcpro2, gcpro3;

     GCPRO3 (infile, buffer, current_dir);
     argument_coding.dst_multibyte = 0;
     for (i = 4; i < nargs; i++)
       {
         argument_coding.src_multibyte = STRING_MULTIBYTE (args[i]);
         if (CODING_REQUIRE_ENCODING (&argument_coding))
           /* We must encode this argument.  */
           args[i] = encode_coding_string (&argument_coding, args[i], 1);
         new_argv[i - 3] = SDATA (args[i]);
       }
     UNGCPRO;
     new_argv[nargs - 3] = 0;
   }

`CODING_REQUIRE_ENCODING' test the following things (line 491 to 496,
coding.h):

/* Return 1 if the coding context CODING requires code conversion on
  encoding.  */
#define CODING_REQUIRE_ENCODING(coding)                         \
 ((coding)->src_multibyte                                    \
  || (coding)->common_flags & CODING_REQUIRE_ENCODING_MASK       \
  || (coding)->mode & CODING_MODE_SELECTIVE_DISPLAY)

Although `argument_coding.src_multibyte' may be 0,
`argument_coding.common_flags & CODING_REQUIRE_ENCODING_MASK' must be
non-zero in this case.  So `CODING_REQUIRE_ENCODING
(&argument_coding)' will return true.

As a result, whether arguments are encoded with `encode-coding-string'
like in your change will not affect the conversion done by
`call-process'.  Perhaps we should not set `coding-system-for-write'
in `let' special form in such conditions.

And there is another problem: if `locale-coding-system' is UTF-16, is
it correct to add prefix "\377\376" or "\376\377" to every command
argument?  If not, the current code of `call-process' is wrong, since
it will always add the prefix.

Regards,
Guanpeng Xu


From: Stefan Monnier <address@hidden>
To: "Herbert Euler" <address@hidden>
CC: address@hidden
Subject: Re: Fcall_process: wrong conversion
Date: Mon, 15 May 2006 12:06:48 -0400

>    - Create a file contains UTF-16 text, either UTF-16BE or UTF-16LE
>      is OK.  For example, create a file contains "a" in UTF-16LE as
>      its content and name this file with "1".
[...]
>    - In case the buffer is encoded with utf-16-le, the content is
>      displayed as "a".  Type M-x hexl-mode RET, the result is

>          \377?: Invalid argument

>      displayed in the buffer.

Thanks.  I've installed the patch below which should fix the problem.
Please confirm,


        Stefan


--- hexl.el     11 avr 2006 12:45:49 -0400      1.103
+++ hexl.el     15 mai 2006 12:02:32 -0400
@@ -704,7 +704,12 @@
        (buffer-undo-list t))
     (apply 'call-process-region (point-min) (point-max)
           (expand-file-name hexl-program exec-directory)
-          t t nil (split-string hexl-options))
+          t t nil
+           ;; Manually encode the args, otherwise they're encoded using
+ ;; coding-system-for-write (i.e. buffer-file-coding-system) which + ;; may not be what we want (e.g. utf-16 on a non-utf-16 system). + (mapcar (lambda (s) (encode-coding-string s locale-coding-system))
+                   (split-string hexl-options)))
     (if (> (point) (hexl-address-to-marker hexl-max-address))
        (hexl-goto-address hexl-max-address))))



_______________________________________________
Emacs-devel mailing list
address@hidden
http://lists.gnu.org/mailman/listinfo/emacs-devel

_________________________________________________________________
Don't just search. Find. Check out the new MSN Search! http://search.msn.com/





reply via email to

[Prev in Thread] Current Thread [Next in Thread]