bug#15803: default-file-name-coding-system: utf-8 better than latin-1 th

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#15803: default-file-name-coding-system: utf-8 better than latin-1 th

From:	Eli Zaretskii
Subject:	bug#15803: default-file-name-coding-system: utf-8 better than latin-1 these days?
Date:	Fri, 08 Dec 2017 11:46:29 +0200

> From: Glenn Morris <rgm@gnu.org>
> Cc: 15803@debbugs.gnu.org
> Date: Mon, 04 Dec 2017 19:35:05 -0500
> 
> Eli Zaretskii wrote:
> 
> > Perhaps on Posix systems, but not elsewhere. 
> 
> I assume non-POSIX is newspeak for MS-Windows (native and DOS).

I didn't say "non-Posix"; you did.

MS-Windows is definitely not a Posix system, but whether it is the
only one, I don't know.  Are we sure all macOS/Darwin systems are
sufficiently Posix in this aspect?  AFAIR they use quite different
encoding methods for file names (canonical normalization etc.).

> > And if we make the change, we should make sure building Emacs in a
> > non-ASCII directory still works.
> 
> It works fine for me on G/L to have source, build, and install
> directories be distinct non-ASCII directories.

Was it in a UTF-8 locale or in a non-UTF-8 locale?  The latter is the
potentially problematic case, AFAIR.

> (Emacs works, that is,
> but makeinfo 5.1 fails to find @include files in non-ASCII directories,
> so I wonder how common such setups are.)

Building a release tarball doesn't require makeinfo.

> BTW, it feels very dated to me to have discussion of Windows 9X in the
> Emacs manual section on file-name-coding.

We still try to support it, and the aspects of file-name encoding
related to it are definitely non-trivial.  Everything described there
is in the code.

> diff --git i/doc/emacs/mule.texi w/doc/emacs/mule.texi
> index 78f77cb..5fc44a6 100644
> --- i/doc/emacs/mule.texi
> +++ w/doc/emacs/mule.texi
> @@ -1214,11 +1214,8 @@ system can encode.
>  
>    If @code{file-name-coding-system} is @code{nil}, Emacs uses a
>  default coding system determined by the selected language environment,
> -and stored in the @code{default-file-name-coding-system} variable.
> -@c FIXME?  Is this correct?  What is the "default language environment"?
> -In the default language environment, non-@acronym{ASCII} characters in
> -file names are not encoded specially; they appear in the file system
> -using the internal Emacs representation.
> +and stored in the @code{default-file-name-coding-system} variable
> +(normally UTF-8).

Not sure why you removed the sentence which had the FIXME comment.  Is
it in any way related to the issue at hand?

>  @cindex file-name encoding, MS-Windows
>  @vindex w32-unicode-filenames
> diff --git i/lisp/international/mule-cmds.el w/lisp/international/mule-cmds.el
> index 9d22d6e..192f0e9 100644
> --- i/lisp/international/mule-cmds.el
> +++ w/lisp/international/mule-cmds.el
> @@ -1797,10 +1797,11 @@ The default status is as follows:
>     'raw-text)
>  
>    (set-default-coding-systems nil)
> -  (setq default-sendmail-coding-system 'iso-latin-1)
> -  ;; On Darwin systems, this should be utf-8-unix, but when this file is 
> loaded
> -  ;; that is not yet defined, so we set it in set-locale-environment instead.
> -  (setq default-file-name-coding-system 'iso-latin-1-unix)
> +  (setq default-sendmail-coding-system 'utf-8)
> +  (setq default-file-name-coding-system (if (memq system-type
> +                                                  '(window-nt ms-dos))
> +                                            'iso-latin-1-unix
> +                                          'utf-8-unix))

Why are we changing sendmail-coding-system?  It has nothing to do with
file names, AFAIK.

>    ;; Preserve eol-type from existing default-process-coding-systems.
>    ;; On non-unix-like systems in particular, these may have been set
>    ;; carefully by the user, or by the startup code, to deal with the
> @@ -1816,8 +1817,10 @@ The default status is as follows:
>       (input-coding
>        (condition-case nil
>            (coding-system-change-text-conversion
> -           (cdr default-process-coding-system) 'iso-latin-1)
> -        (coding-system-error 'iso-latin-1))))
> +           (cdr default-process-coding-system)
> +           (if (memq system-type '(window-nt ms-dos)) 'iso-latin-1 'utf-8))
> +        (coding-system-error
> +         (if (memq system-type '(window-nt ms-dos)) 'iso-latin-1 'utf-8)))))
>      (setq default-process-coding-system
>         (cons output-coding input-coding)))

And this changes the default encoding used to communicate with
sub-processes.  Why?  We never talked about a wholesale change of all
the defaults to UTF-8, that is a much more broad issue than just
encoding of file names.

> diff --git i/lisp/mh-e/mh-comp.el w/lisp/mh-e/mh-comp.el
> index 98067ce..25118cd 100644
> --- i/lisp/mh-e/mh-comp.el
> +++ w/lisp/mh-e/mh-comp.el
> @@ -304,6 +304,7 @@ message and scan line."
>    (let ((draft-buffer (current-buffer))
>          (file-name buffer-file-name)
>          (config mh-previous-window-config)
> +        ;; FIXME this is subtly different to select-message-coding-system.
>          (coding-system-for-write
>           (if (and (local-variable-p 'buffer-file-coding-system
>                                      (current-buffer)) ;XEmacs needs two args
> @@ -315,7 +316,7 @@ message and scan line."
>             (or (and (boundp 'sendmail-coding-system) sendmail-coding-system)
>                 (and (default-boundp 'buffer-file-coding-system)
>                      (default-value 'buffer-file-coding-system))
> -               'iso-latin-1))))
> +               'utf-8))))

Changes like that in MH-E should be communicated to the MH-E
developer; I 'm not sure he is reading this list.

And you never answered my question about the rationale:

> Btw, why does the default matter so much?  Once Emacs starts up
> default-file-name-coding-system on GNU/Linux is set to UTF-8, if the
> locale says so.  Is this just an aesthetic issue?

[Prev in Thread]

Current Thread

[Next in Thread]

bug#15803: default-file-name-coding-system: utf-8 better than latin-1 these days?, Eli Zaretskii, 2017/12/01
- bug#15803: default-file-name-coding-system: utf-8 better than latin-1 these days?, Glenn Morris, 2017/12/04
  - bug#15803: default-file-name-coding-system: utf-8 better than latin-1 these days?, Eli Zaretskii <=
    - bug#15803: default-file-name-coding-system: utf-8 better than latin-1 these days?, Glenn Morris, 2017/12/11

Prev by Date: bug#11700: 24.1.50; Bad interaction between BiDi and org-tables
Next by Date: bug#24914: 24.5; isearch-regexp: wrong error message
Previous by thread: bug#15803: default-file-name-coding-system: utf-8 better than latin-1 these days?
Next by thread: bug#15803: default-file-name-coding-system: utf-8 better than latin-1 these days?
Index(es):
- Date
- Thread