emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF


From: Eli Zaretskii
Subject: Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files
Date: Sun, 27 Sep 2015 11:55:36 +0300

> Cc: address@hidden, address@hidden, address@hidden
> From: Paul Eggert <address@hidden>
> Date: Sun, 27 Sep 2015 01:22:48 -0700
> 
> Eli Zaretskii wrote:
> > I've also looked at the *.po files in the latest releases of GNU Make,
> > Gawk, Texinfo, and Binutils, and I find that between 20% and 25% of
> > such files still use non-UTF-8 encodings.
> 
> Yes, and those files are a pain to look at with Emacs now, since it typically 
> misguesses their encodings.  Presumably Emacs should be looking at .po files' 
> charset= decorations.

You need to install the po-mode.

But anyway, that's not the issue at hand.  I just used those files as
indicators of preferences of some locales.

> > while I agree with you that UTF-8 encoded files are the majority
> > among non-ASCII files (and Emacs development aligns itself with that
> > fact very well), the non-UTF-8 minority, even in the Posix world, is
> > still significant enough, and we cannot possibly ignore it.
> 
> Naturally we cannot ignore it.  All I'm suggesting is that we change the 
> default 
> behavior so that it's more UTF-8 friendly, since that's the way the world is 
> going.  The old Emacs behavior should still be available, for people who need 
> it.

You use "default" here in a sense that is different from what the Mule
stuff does.  Since Emacs attempts to support i18n, not just l10n, it
cannot ask users to modify their defaults whenever they meet a file
that's decoded incorrectly.  Emacs uses the defaults in this area as
the last resort, when no other information is available in the file
itself or its accompanying meta-data.  That default is already as
friendly to UTF-8 as possible: UTF-8 is used in any locale where
that's the default.  Going further, i.e. preferring UTF-8 in locales
whose preferences are different, will simply bring back the old bugs
and misfeatures of Emacs 20 and 21 which we worked so hard to
eradicate.

IMO, the _only_ sane way forward is to introduce more reliable ways of
detecting the encoding, whether by using some new kinds of meta-data
or by more extensive analysis of the text itself.  (The latter
solution will probably have difficulties with decoding sub-process
output, but it could be very efficient with disk files and large
bodies of text made available to Emacs at once.)

IOW, I don't think we will be able to change our locale-derived
defaults any time soon.  What we can do is minimize the probability of
having to fall back on those defaults.  But this requires that
Someoneā„¢ volunteers to revamp our detect_coding_* implementations in
that direction.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]