Hello!
Is there a reliable way to pass unicode file names as
arguments through `start-process'?
I realized two limitations:
1. Using `prefer-coding-system' with anything other than
`locale-default-encoding', e.g.
(prefer-coding-system 'utf-8),
causes a file name "Ö.txt" to be misdecoded as by
subprocesses -- notably including "emacs.exe", but also
all other executables I tried (both Windows builtins like
where.exe and third party executables like ffmpeg.exe or
GnuWin32 utilities).
In my case (German locale, 'utf-8 preferred coding
system) it is mis-decoded as "Ö.txt", i.e. emacs encodes
the process argument as 'utf-8 but the subprocess decodes
it as 'latin-1 (in my case).
While this can be fixed by an explicit encoding
(start-process ...
(encode-coding-string filename locale-coding-system))
such code will probably not be used in most projects, as
the issue occurs only on Windows, dependent on the user
configuration (-> hard-to-find bug?). I have added some
elisp for demonstration at the end of the mail.
2. When a file-name contains characters that cannot be
encoded in the locale's encoding, e.g. Japanese
characters in a German locale, I cannot find any way to
pass the file name through the `start-process' interface;
Unlike for characters, that are supported by the locale,
it fails even in a clean "emacs -Q" session.
Curiously the file name can still be used in cmd.exe,
though entering it may require TAB-completion (even
though the active codepage shouldn't support them).
- Klaus
---------------- EXAMPLE CODE --------------------
;; Setup: Create a file "unifilebug/Ö.txt" with
;; some arbitrary text. Make sure it is the only file in
;; "unifilebug".
;;
;; Note that for this issue it doesn't matter what coding system
;; is chosen for file names (Unix only; On Windows the coding
;; system for file names is fixed anyway.)
;; Set the preferred coding system.
(prefer-coding-system 'utf-8)
;; Try opening it in an emacs subprocess.
;;
;; On Windows this breaks
;; if `prefer-coding-system' was called with anything other than
;; `locale-coding-system', here 'utf-8.
;;
;; On Unix (tested with cygwin), it works fine; Presumably because
;; the file name is decoded (in `directory-files') and encoded (in
;; `start-process') with the same preferred coding system.
(let ((file-name (car (directory-files "unifilebug" t "txt$"))))
(start-process "" nil "emacs" "-Q" file-name))
;; It can be fixed by explicitly encoding file-names. This
;; thankfully works both in the W32 and the Cygwin version of
;; emacs.
(let ((file-name (car (directory-files "unifilebug" t "txt$"))))
(start-process "" nil "emacs" "-Q"
(encode-coding-string file-name locale-coding-system)))
;; Now we create a file called "ufb2/こんにちは世界.txt"
;; Even in a emacs-session without prefer-coding-system it will
;; fail, decoding the file-name as "ufb2/ .txt".
(let ((file-name (car (directory-files "ufb2" t "txt$"))))
(start-process "" nil "emacs" "-Q" file-name))
--------------------------------------------------