emacs-pretest-bug
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: local chars displayed as numbers


From: Kenichi Handa
Subject: Re: local chars displayed as numbers
Date: Sat, 23 Sep 2006 15:29:29 +0900
User-agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/22.0.50 (i686-pc-linux-gnu) MULE/5.0 (SAKAKI)

In article <address@hidden>, Stefan Monnier <address@hidden> writes:

> > I don't think it uncommon.  People migrate from Windows to GNU/Linux
> > (or switch between both), people exchange files with Windows users,
> > ... (and on Windows, it's quite common to insert `smart quotes' and
> > other non-Latin-1 characters).

> True, but in my experience plain-text files using windows-1252 are still
> rather uncommon under GNU/Linux.  Of course, it depends on the specifics,
> but adapting Emacs to the specific circumstance should be done via the
> .emacs, I think.

> > What is the benefit to treat it as raw-text instead of window-1252
> > assuming that the file only contains characters from window-1252?  We
> > are taking about a file (> 300000 chars of text) with mostly ASCII,
> > some Latin-1 [ÄÖÜäöüß] (1.3%, probably typical for a German text), and
> > 19 \202 characters (= 0.005%).

> Obviously, in the case where the file is using window-1252 encoding, there's
> no harm in Emacs using the windows-1252 encoding.  But what about the other
> cases, e.g. if the file is just binary, or slightly incorrect utf-8, or ...?

At least windows-1252 doesn't cover all eight-bit bytes.
There are a few invalid bytes: 0x81, 0x8c, 0x8e...

Anyway, how about thinking the situation this way.

When one visits a binary file and it's detected as
windows-1252, usually he can easily notice that the
auto-detection did bad thing because a binary file tend to
contain many 8-bit bytes in the first page.  So, he can
re-read the file by C-x C-m c binary RET C-x C-v RET.  But,
when one visits a windows-1252 file and it's read as
raw-text, it's more difficult to notice that the file is not
correctly decoded because it may not contain a raw-byte in
the first page.  In this case, he'll notice the problem only
after he did some editing, and that is too late to re-read
the file.

Stefan Monnier <address@hidden> writes:

> So I'd rather have a tool that explains what's going on, so that the user
> can decide to use window-1252 if it's a good choice for her, rather than
> force windows-1252 on all users most of whom won't ever edit a file with
> window-1252 encoding.

How about indicating a binary buffer in more outstanding
way, for instance, changing the mode line color and show
"BINARY FILE" in the mode line?

---
Kenichi Handa
address@hidden




reply via email to

[Prev in Thread] Current Thread [Next in Thread]