help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Automatic recognition of some specific coding systems


From: Jürgen Hartmann
Subject: RE: Automatic recognition of some specific coding systems
Date: Thu, 26 Feb 2015 23:34:05 +0100

@Eli Zaretskii: Thank you very much for your profound assessment:

> It looks like what you want is beyond the current capabilities of
> Emacs's auto-detection of encoding.  See below for some alternatives.
>  
> Having said that...
>  
>> By the way, could you verify, that this is possible with Emacs 22.3
>> with the customization described in my previous post?
>  
> ...no, it doesn't work for me.  The latin-9 file is decoded using my
> locale's encoding (which isn't latin-9), and cp850 file is still
> raw-text.

Oops, this is an important finding indeed.

> So I think some other factor(s) is/are at work on your system.  Your
> locale's encoding is certainly one of them, but I think there should
> be something else, either in your customizations or somewhere else.

I just repeated the tests with Emacs 22.3 using the POSIX locale,

   LC_ALL=C ./emacs -q

and you are right: the cp850 file was recognized as raw-text now. The
locale I used before was

   de_DE.UTF-8

The more I get involved in this topic the more I see that it is much
more complex that I thought at first glance.

> In general, even if Emacs 22.3 was capable to do the job, I think it
> was by sheer luck, and is anyway fragile, since the same
> customizations don't work for me (and AFAIU, aren't supposed to work).
> So I would suggest to explore alternative ways of doing this in Emacs
> 24 reliably.

This sounds reasonable to me. Besides the aspect of reliability, which
is of curse the most important one, doing so might also yield a
solution that is likely to survive future updates.

> Some possibilities you may wish to explore:
>  
>   . Put a 'coding: cp850' cookie in the cp850 files

I would rather avoid altering the files content for this technical reason.

>   . If the names of the cp850 files all match some common pattern, you
>     can use modify-coding-system-alist to tell Emacs to decode them by
>     cp850

Unfortunately in my case there is no such pattern in the file names
that would allow to tell which coding the respective file might use.

>   . Similarly, if the cp850 files' contents match some common regexp,
>     you can customize auto-coding-regexp-alist to force their decoding
>     by cp850

That one might do the trick: In my case the only files (at least in
the big picture) that use the DOS EOL variant are those encoded with
cp850 and vice versa. So one could think about a regular expression
that matches this unique EOL pattern.

> Of course, you can always turn the table, and do the above for
> latin-9, while keeping cp850 in set-coding-system-priority call.  It
> all depends which one of these 2 lends itself better to one of these
> methods.
>  
> I believe that if one of these alternatives can do the job for you,
> the result will be much more reliable.

I also think so.

So, I have to play around a little bit to get acquainted with the
construction of regular expressions for Emacs. I will be back when I
have gained a deeper insight, or a concrete solution at best.

Meanwhile I would like to thank you, Eli Zaretskii, very much for your
time and effort that you spent to provide me with this thorough
analysis and your valuable suggestions.

Juergen

                                          


reply via email to

[Prev in Thread] Current Thread [Next in Thread]