help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Ediff problem with accents


From: Peter Dyballa
Subject: Re: Ediff problem with accents
Date: Fri, 22 Sep 2006 12:42:39 +0200


Am 22.09.2006 um 11:20 schrieb Sébastien Vauban:

Hello Peter,

Sorry for the long delay... but it was impossible for me to make
the wished tests until now.

    Note -- You can copy this mail to gnus.emacs.help. I don't
            have news access from where I am now...

FYI, I've sanitized my .emacs section about the coding systems
(you'll see an extract beneath), and I've made a lot of
comparisons.

I still have the problem, but here follows a deeper insight on
what I'm experiencing:

    o   if (prefer-coding-system 'iso-latin-9),
        then I see the following when ediff'ing:

        ----------------------------------------
        | ^M               |                   |
        | pr\351sente ^M   | présente          |
        | ^M               |                   |
        |                  |                   |
        |------------------|-------------------|
        |-0:%%             |-0\--              |  (modeline)
        ----------------------------------------
          iso-latin-9?       iso-latin-9-dos

That's correct, for the modeline: ISO 8859-15 or ISO Latin-9 encoding is used. The left buffer is read-only, the right one is not changed? The ``\´´ puzzles me, but it's such a long time that I have used GNU Emacs on some MS Losedows, that I cannot remember. The ^M in the left buffer should not appear, probably the right value for the encoding is the right one, which also presents présente the right way.



    o   if (prefer-coding-system 'utf-8),
        then I see the following when ediff'ing:

        ----------------------------------------
        | ^M               |                   |
        | pr\351sente ^M   | présente          |
        | ^M               |                   |
        |                  |                   |
        |------------------|-------------------|
        |-u:%%             |-1\--              |
        ----------------------------------------
          utf-8?             iso-latin-1-dos

Again, the mode-lines are right and the left buffer need to be specified as utf-8-dos to make the ^M disappear and make pr\351sente appear correctly. The prefer-coding-system function allows the use of "extensions" like -dos, -mac, -unix to specify exactly the preferred encoding.



    o   if I don't set any preferred coding system (commented line),
        then I see the following when ediff'ing:

        ----------------------------------------
        | ^M               |                   |
        | pr\351sente ^M   | présente          |
        | ^M               |                   |
        |                  |                   |
        |------------------|-------------------|
        |-1\%%             |-1\--              |
        ----------------------------------------
          iso-latin-1-dos?   iso-latin-1-dos

Here certainly the left buffer is not -dos – otherwise the ^M would not appear there.


To indicate the coding system under the window, I used

    M-x describe-coding-system RET RET

but, for the base version, it states "not set locally, use the
default"; that's why I wrote the default coding system for new
files and put a interrogation mark after (because I'm not sure
this is the correct way to do).

There are no local settings in the file (see below), so some default is assumed that the *Help* buffer should describe:

        Coding system for saving this buffer:
          0 -- iso-latin-9-unix
        
        Default coding system (for new files):
          u -- mule-utf-8-unix
        
        Coding system for keyboard input:
          nil
        Coding system for terminal output:
          u -- mule-utf-8 (alias: utf-8)
        
        Defaults for subprocess I/O:
          decoding: u -- mule-utf-8 (alias: utf-8)
        
          encoding: u -- mule-utf-8 (alias: utf-8)
        
        
        Priority order for recognizing coding systems when reading files:
          .
          .
          .


The prefer-coding-system setting also effects your old files: they can now be interpreted differently then when they were created and saved. You could continue to stick at iso-latin-9-dos to have the € and keep your old files unchanged. Every new (and old) file will have some extra ^M bytes, but at least new and old ones will be treated equally. (Conversion could be done, on the command line (recode, iconv) or more time consuming with GNU Emacs: Options menu -> Mule -> Set Coding Systems.)


So, you can see that, whatever I do, I can't compare my buffers
in a normal way... I'm completely lost...

Try: (prefer-coding-system 'iso-latin-9-dos).

You also can use some of these calls each with a different encoding. These will make GNU Emacs first to choose from this list and then try to find another encoding.


PS- As promised, an extract of my .emacs config file:

,----[ my Emacs Init File ]
|
| (message "26 International Character Set Support...")
|
| ;; default input method for multilingual text
| (setq default-input-method "latin-9-prefix")

I do not use any input method: my keyboard creates/composes é by pressing the dead key ´ first and then the e. Works also for some other accented characters. Actually I think I never used any Emacs input method. 20 years ago I had DEC or Sun keyboards with a Compose key, now the X server allows to have other characters with alt or shift-alt pressed ...

|
| ;; if you want to use UTF-8 on Emacs 21.3, install Mule-UCS
| (GNUEmacs
|     (try-require 'un-define))

This was necessary with GNU Emacs 20. The recent versions 21.x have MULE somehow built-in. Could be that this line causes a lot of your trouble. (A good way to test the built-in capabilities is to launch GNU Emacs with -Q: no site or user specific initialisation files are used. And it might perform better ...)

|
| (add-to-list 'file-coding-system-alist
|              '("\\.owl\\'" utf-8 . utf-8))

This obviously only effects .owl files.

| ;; In GNU Emacs, when you specify the coding explicitly in the file, that
| ;; overrides `file-coding-system-alist'. Not in XEmacs?
|
| ;; ;; default coding system (for new files)
| (GNUEmacs
|     (prefer-coding-system 'utf-8))

You might consider to add -dos, but it's more important that you understand that this change will make a lot of your old files unusable. In UTF-8 only the 7 bit ASCII range is encoded by one octet. All 8 bit characters from the ISO Latin encodings are encoding by two octets (or even three, for example the €). Your é is encoded as C3 A9 (€ as the well known E2 82 AC). If GNU Emacs only sees E9 (or A4 for €), it will make mistakes! If you switch to UTF-8 you would need to convert all text files first, or save their old encodings by adding a header line like this as the first line:

         -*- mode: Text; coding: iso-8859-9; -*-

The mode part is not necessary (could also be tex or latex), but coding *is*. The other option is 'local variables' in the file's footer:

        %%% Local Variables:
        %%% coding: iso-8859-9
        %%% mode: tex
        %%% End:

and might need to teach GNU Emacs that these local variables are 'safe'.

|
| (GNUEmacs
|     ;; to copy and paste outside Emacs
|     (set-clipboard-coding-system 'iso-latin-9))  ;; aka iso-8859-15

This depends on the windowing system you use. Now, I think, most will use UTF-8 ...

|
| ;; unify the Latin-N charsets, so that Emacs knows that the é in Latin-9 | ;; (with the euro) is the same as the é in Latin-1 (without the euro)
| ;; [avoid the small accentuated characters]
| (when (try-require 'ucs-tables)
|     (unify-8859-on-encoding-mode 1)  ;; harmless
| (unify-8859-on-decoding-mode 1)) ;; may unexpectedly change files if they | ;; contain different Latin-N charsets
|                                      ;; which should not be unified

I use these two in GNU Emacs 21.3.50 without the MULE/ucs clause ...

|
| (when window-system
|   ;; functions for dealing with char tables
|   (require 'disp-table))

This might have been useful in GNU Emacs 20 and before. I never used it, except for european-display or such, maybe. And I also avoid set- language-environment: this is close to obsolete, politely writing.

|
| (XEmacs
|     (require 'iso-syntax))

I'm not really an XEmacs user, but I think this is also something from the past, 20th century or before.


Could be GNU Emacs 22.0.50 serves you better. Both GNU Emacsen, 22.0.50 and 21.3, work better and set up internally better when they read environment variables like LC_CTYPE, LANG, or LC_ALL that explain in which environment they are running. Then you only need to specify exceptions from this general rule. I have in my environment LC_CTYPE=de_DE.UTF-8 ...

--
Greetings

  Pete

A morning without coffee is like something without something else.







reply via email to

[Prev in Thread] Current Thread [Next in Thread]