emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF


From: Eli Zaretskii
Subject: Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files
Date: Sat, 26 Sep 2015 20:25:19 +0300

> Cc: address@hidden, address@hidden
> From: Paul Eggert <address@hidden>
> Date: Sat, 26 Sep 2015 09:01:04 -0700
> 
> Eli Zaretskii wrote:
> > So you are, in effect, saying that it is incorrect to derive the
> > default encodings from the locale's codeset?
> 
> Yes, for Emacs developers.  And come to think of it, for most Emacs users. 
> Nowadays in my experience most non-ASCII text files use UTF-8, regardless of 
> locale.

Are you sure your experience isn't biased by the fact you mostly work
in UTF-8 locales?

> The old days of having to guess encoding from the locale are passing
> away.  This is partly due to UTF-8 being the encoding of choice for
> HTML and XML, where UTF-8 overtook the older 8-bit encodings in 2008
> and now is by far the dominant encoding.

We already DTRT with XML files, and should be doing TRT with any file
format that includes the specification of the encoding in it.

The problem, IMO, is not only with disk files.  It is also with email
messages, output from processes, etc.  E.g., I routinely get Latin-1
encoded email from people whose platform is GNU/Linux.  IOW, non-UTF
encodings are far from being dead yet.

Using UTF-8 by default is certainly wrong on MS-Windows.

> One way to accommodate the new reality would be to change Emacs so that by 
> default the system locale does not affect Emacs's guess of a file's encoding 
> if 
> the file's initial sample is valid UTF-8.  Users could set a variable to 
> re-enable the old behavior.

The problem with this line of thought is that "initial sample" part --
how far into the file should we look, how far is far enough?  E.g.,
tips.texi has its first non-ASCII character at character position
25353.  We've been there before, and found this not reliable enough.

Anyway, doesn't "(prefer-coding-system 'utf-8)" already does what you
want us to offer?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]