emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Passing unicode filenames to start-process on Windows?


From: Klaus-Dieter Bauer
Subject: Passing unicode filenames to start-process on Windows?
Date: Wed, 6 Jan 2016 16:20:29 +0100

Hello! 

Is there a reliable way to pass unicode file names as
arguments through `start-process'?

I realized two limitations:


1. Using `prefer-coding-system' with anything other than
   `locale-default-encoding', e.g. 
   
       (prefer-coding-system 'utf-8), 
       
   causes a file name "Ö.txt" to be misdecoded as by
   subprocesses -- notably including "emacs.exe", but also
   all other executables I tried (both Windows builtins like
   where.exe and third party executables like ffmpeg.exe or
   GnuWin32 utilities). 
   
   In my case (German locale, 'utf-8 preferred coding
   system) it is mis-decoded as "Ö.txt", i.e. emacs encodes
   the process argument as 'utf-8 but the subprocess decodes
   it as 'latin-1 (in my case).
   
   While this can be fixed by an explicit encoding 
   
       (start-process ... 
         (encode-coding-string filename locale-coding-system))
   
   such code will probably not be used in most projects, as
   the issue occurs only on Windows, dependent on the user
   configuration (-> hard-to-find bug?). I have added some
   elisp for demonstration at the end of the mail.

     
2. When a file-name contains characters that cannot be
   encoded in the locale's encoding, e.g. Japanese
   characters in a German locale, I cannot find any way to
   pass the file name through the `start-process' interface; 
   Unlike for characters, that are supported by the locale, 
   it fails even in a clean "emacs -Q" session. 
   
   Curiously the file name can still be used in cmd.exe,
   though entering it may require TAB-completion (even
   though the active codepage shouldn't support them).


- Klaus


---------------- EXAMPLE CODE --------------------

;; Setup: Create a file "unifilebug/Ö.txt" with
;; some arbitrary text. Make sure it is the only file in
;; "unifilebug". 
;; 
;; Note that for this issue it doesn't matter what coding system
;; is chosen for file names (Unix only; On Windows the coding
;; system for file names is fixed anyway.)


;; Set the preferred coding system. 
(prefer-coding-system 'utf-8)


;; Try opening it in an emacs subprocess. 
;; 
;; On Windows this breaks
;; if `prefer-coding-system' was called with anything other than
;; `locale-coding-system', here 'utf-8. 
;; 
;; On Unix (tested with cygwin), it works fine; Presumably because
;; the file name is decoded (in `directory-files') and encoded (in
;; `start-process') with the same preferred coding system.
(let ((file-name (car (directory-files "unifilebug" t "txt$"))))
  (start-process "" nil "emacs" "-Q" file-name))


;; It can be fixed by explicitly encoding file-names. This
;; thankfully works both in the W32 and the Cygwin version of
;; emacs.
(let ((file-name (car (directory-files "unifilebug" t "txt$"))))
  (start-process "" nil "emacs" "-Q" 
    (encode-coding-string file-name locale-coding-system)))


;; Now we create a file called "ufb2/こんにちは世界.txt"
;; Even in a emacs-session without prefer-coding-system it will
;; fail, decoding the file-name as "ufb2/ .txt".
(let ((file-name (car (directory-files "ufb2" t "txt$"))))      
  (start-process "" nil "emacs" "-Q" file-name))


--------------------------------------------------


reply via email to

[Prev in Thread] Current Thread [Next in Thread]