emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: address@hidden: Coding problem with Euro sign]


From: Kevin Rodgers
Subject: Re: address@hidden: Coding problem with Euro sign]
Date: Thu, 15 Dec 2005 15:02:48 -0700
User-agent: Mozilla Thunderbird 0.9 (X11/20041105)

Ralf Angeli wrote:
> * Kevin Rodgers (2005-12-15) writes:
>
>>Ralf Angeli wrote:
>>
>>>* Kevin Rodgers (2005-12-14) writes:
>>>>And the OP should try visiting the file with the cp1252 coding system.
>>>
>>>Well, the question now is if it is possible for Emacs to figure out
>>>the coding system on itself with the example at hand.
>>
>>You could try something like this:
>>
>>(setq auto-coding-regexp-alist
>>       (cons '("[\040-\177][\200-\237]" . cp1252)
>>             auto-coding-regexp-alist))
>>
>>I don't think that's a general purpose solution since (1)
>>auto-coding-regexp-alist actually has precedence over `-*-coding:-*-'
>>file variables and (2) other encodings probably use those o200 - o237
>>bytes (certainly other Microsoft Windows code pages do).
>
> This doesn't seem to work here.  I still see the byte codes of the
> 8-bit characters when opening the file after evaluating the above
> form.

OK, now I've actually tried that here in Emacs 21.4 running on
Unix/Solaris under X.  First it complained that cp1252 is an invalid
coding system, so I found the "MS-DOS and MULE" Info node referenced
from the "Coding Systems" node and tried `M-x codepage-setup'.  It
wouldn't take 1252, but a quick search in that node revealed that the
right number is 850.

So I tweaked the auto-coding-regexp-alist entry to use cp850 and
revisited the file.  Now instead of displaying the u umlaut and A
circumflex characters as such in my default font's character set
(iso8859-1) and the euro as "\200", Emacs displays the u umlaut as
superscript 3, A circumflex as "\302", and the euro as C cedilla.

I assume those display problems are because I haven't configured an
Emacs fontset for the cp850 coding system.  But the
auto-coding-regexp-alist entry worked as intended, and you're on
Windows so your fontset should be properly configured for that.

One other detail: that entry only sets the coding system if the euro
is immediately preceded by an ASCII character.  Is that the case in
your file?  What does `C-h C RET' say after visiting the file?

I assume you're running with multibyte characters enabled.

> And a customization is actually not what I am interested in; I'd like
> Emacs to figure this out by itself, out of the box.

How is Emacs supposed to infer the coding system from the contents of
that file?  If you can come up with a suitable customization, perhaps
it will be incorporated into Emacs as the default behavior.

> I am not sure how common something like the case at hand is but it is
> certainly not academic.  And if one is working with different
> operating systems or interchanging files with people working on
> different operating systems the failure to detect the correct coding
> could lead to people regarding Emacs as a truly inferior piece of
> software.  I can already hear them: "What?  It displays the Euro sign
> as \200?  Even Notepad gets this right!"  On these grounds it may
> become a bit hard to convince people that Emacs is the one true
> editor.

Can Notepad display files in anything besides CP850/Windows-1252 and
probably UTF-8 w/BOM?  E.g. can it distinguish ISO 8859-1 from ISO
8859-2 from ISO 8859-15?

> Anyway, I tested a bit and under Windows (surprise) every application
> I tried (e.g. Notepad and OpenOffice) managed to display the file
> correctly.  On GNU/Linux no application got it right.  I checked with
> less, more, vim, nano, pico, and OpenOffice.  Either "garbage" was
> displayed or (in case of OpenOffice) a dialog asking the user to
> specify the encoding.  So it's not like Emacs isn't in good company.
> Nevertheless it would be nice if Emacs got it right.  Unfortunately I
> lack the knowledge for judging if this is possible at all without
> having to use all sorts of unreliable heuristics which are costly to
> implement.

Yes, Windows applications simply assumes you're using a proprietary
Microsoft character set, and GNU/Linux apps prioritize support for
standard character encodings.  Maybe all you need is
(prefer-coding-system 'cp850)

--
Kevin Rodgers





reply via email to

[Prev in Thread] Current Thread [Next in Thread]