[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Displaying bytes (was: Inadequate documentation of silly characters
From: |
Richard Stallman |
Subject: |
Re: Displaying bytes (was: Inadequate documentation of silly characters on screen.) |
Date: |
Sun, 29 Nov 2009 11:01:21 -0500 |
We don't want to raise the priority of windows-1252 because it would
cause many other encodings not to be recognized.
If it turns out that windows-1252 files are the main cause of
8-bit-control characters in the buffer, here's another idea.
If visiting a file gives you some 8-bit-control characters,
ask the user "Is this file encoded in Windows encoding (windows-1252)?"
and do so if she says yes.
Here's another idea. We could employ some heuristics to see if the
distribution of those characters seems typical for the way those
characters are used. For instance, some of the punctuation characters
(the ones that represent quotation marks) should always have
whitespace or punctuation on at least one side. Also, there should be
no ASCII control characters other than whitespace. Maybe more
specific heuristics can be developed.
These could be used as conditions for recognizing the file as
windows-1252. If these heuristics are strong enough, they could
reject nearly all false matches, provided the file is long enough.
(A minimum length could be part of the conditions.) Then we
could increase the priority of windows-1252 without the bad
side effect of using it when it is not intended.
This is ad-hoc, and not elegant. But the problem is important enough
in practice that an ad-hoc solution is justified if it works well.
- Re: Displaying bytes (was: Inadequate documentation of silly, (continued)
- Re: Displaying bytes (was: Inadequate documentation of silly characters on screen.), Ulrich Mueller, 2009/11/25
- Re: Displaying bytes, Reiner Steib, 2009/11/26
- Re: Displaying bytes, Ulrich Mueller, 2009/11/26
- Re: Displaying bytes, Stefan Monnier, 2009/11/26
- Re: Displaying bytes, Stephen J. Turnbull, 2009/11/26
- Re: Displaying bytes (was: Inadequate documentation of silly characters on screen.), Stephen J. Turnbull, 2009/11/25
- Re: Displaying bytes (was: Inadequate documentation of silly characters on screen.), Kenichi Handa, 2009/11/25
- Re: Displaying bytes (was: Inadequate documentation of silly characters on screen.),
Richard Stallman <=
- Re: Displaying bytes (was: Inadequate documentation of silly, Stefan Monnier, 2009/11/29
- Re: Displaying bytes (was: Inadequate documentation of silly, Juri Linkov, 2009/11/29
- Re: Displaying bytes (was: Inadequate documentation of silly, tomas, 2009/11/30
- Re: Displaying bytes (was: Inadequate documentation of silly, Andreas Schwab, 2009/11/30
- Re: Displaying bytes (was: Inadequate documentation of silly, tomas, 2009/11/30
- Re: Displaying bytes (was: Inadequate documentation of silly characters on screen.), Kim F. Storm, 2009/11/29
- Re: Displaying bytes (was: Inadequate documentation of silly characters on screen.), Stephen J. Turnbull, 2009/11/29
- Re: Displaying bytes, Stefan Monnier, 2009/11/23
- Re: Displaying bytes, Richard Stallman, 2009/11/24
- Re: Displaying bytes, Stefan Monnier, 2009/11/24