Re: Ediff problem with accents

help-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Ediff problem with accents

From:	Peter Dyballa
Subject:	Re: Ediff problem with accents
Date:	Fri, 22 Sep 2006 12:42:39 +0200


Am 22.09.2006 um 11:20 schrieb Sébastien Vauban:

Hello Peter,

Sorry for the long delay... but it was impossible for me to make
the wished tests until now.

    Note -- You can copy this mail to gnus.emacs.help. I don't
            have news access from where I am now...

FYI, I've sanitized my .emacs section about the coding systems
(you'll see an extract beneath), and I've made a lot of
comparisons.

I still have the problem, but here follows a deeper insight on
what I'm experiencing:

    o   if (prefer-coding-system 'iso-latin-9),
        then I see the following when ediff'ing:

        ----------------------------------------
        | ^M               |                   |
        | pr\351sente ^M   | présente          |
        | ^M               |                   |
        |                  |                   |
        |------------------|-------------------|
        |-0:%%             |-0\--              |  (modeline)
        ----------------------------------------
          iso-latin-9?       iso-latin-9-dos

That's correct, for the modeline: ISO 8859-15 or ISO Latin-9 encodingis used. The left buffer is read-only, the right one is not changed?The ``\´´ puzzles me, but it's such a long time that I have used GNUEmacs on some MS Losedows, that I cannot remember. The ^M in the leftbuffer should not appear, probably the right value for the encodingis the right one, which also presents présente the right way.



    o   if (prefer-coding-system 'utf-8),
        then I see the following when ediff'ing:

        ----------------------------------------
        | ^M               |                   |
        | pr\351sente ^M   | présente          |
        | ^M               |                   |
        |                  |                   |
        |------------------|-------------------|
        |-u:%%             |-1\--              |
        ----------------------------------------
          utf-8?             iso-latin-1-dos

Again, the mode-lines are right and the left buffer need to bespecified as utf-8-dos to make the ^M disappear and make pr\351senteappear correctly. The prefer-coding-system function allows the use of"extensions" like -dos, -mac, -unix to specify exactly the preferredencoding.



    o   if I don't set any preferred coding system (commented line),
        then I see the following when ediff'ing:

        ----------------------------------------
        | ^M               |                   |
        | pr\351sente ^M   | présente          |
        | ^M               |                   |
        |                  |                   |
        |------------------|-------------------|
        |-1\%%             |-1\--              |
        ----------------------------------------
          iso-latin-1-dos?   iso-latin-1-dos

Here certainly the left buffer is not -dos – otherwise the ^M wouldnot appear there.


To indicate the coding system under the window, I used

    M-x describe-coding-system RET RET

but, for the base version, it states "not set locally, use the
default"; that's why I wrote the default coding system for new
files and put a interrogation mark after (because I'm not sure
this is the correct way to do).

There are no local settings in the file (see below), so some defaultis assumed that the *Help* buffer should describe:


        Coding system for saving this buffer:
          0 -- iso-latin-9-unix
        
        Default coding system (for new files):
          u -- mule-utf-8-unix
        
        Coding system for keyboard input:
          nil
        Coding system for terminal output:
          u -- mule-utf-8 (alias: utf-8)
        
        Defaults for subprocess I/O:
          decoding: u -- mule-utf-8 (alias: utf-8)
        
          encoding: u -- mule-utf-8 (alias: utf-8)
        
        
        Priority order for recognizing coding systems when reading files:
          .
          .
          .

The prefer-coding-system setting also effects your old files: theycan now be interpreted differently then when they were created andsaved. You could continue to stick at iso-latin-9-dos to have the €and keep your old files unchanged. Every new (and old) file will havesome extra ^M bytes, but at least new and old ones will be treatedequally. (Conversion could be done, on the command line (recode,iconv) or more time consuming with GNU Emacs: Options menu -> Mule ->Set Coding Systems.)


So, you can see that, whatever I do, I can't compare my buffers
in a normal way... I'm completely lost...


Try: (prefer-coding-system 'iso-latin-9-dos).

You also can use some of these calls each with a different encoding.These will make GNU Emacs first to choose from this list and then tryto find another encoding.


PS- As promised, an extract of my .emacs config file:

,----[ my Emacs Init File ]
|
| (message "26 International Character Set Support...")
|
| ;; default input method for multilingual text
| (setq default-input-method "latin-9-prefix")

I do not use any input method: my keyboard creates/composes é bypressing the dead key ´ first and then the e. Works also for someother accented characters. Actually I think I never used any Emacsinput method. 20 years ago I had DEC or Sun keyboards with a Composekey, now the X server allows to have other characters with alt orshift-alt pressed ...

|
| ;; if you want to use UTF-8 on Emacs 21.3, install Mule-UCS
| (GNUEmacs
|     (try-require 'un-define))

This was necessary with GNU Emacs 20. The recent versions 21.x haveMULE somehow built-in. Could be that this line causes a lot of yourtrouble. (A good way to test the built-in capabilities is to launchGNU Emacs with -Q: no site or user specific initialisation files areused. And it might perform better ...)

|
| (add-to-list 'file-coding-system-alist
|              '("\\.owl\\'" utf-8 . utf-8))


This obviously only effects .owl files.

| ;; In GNU Emacs, when you specify the coding explicitly in thefile, that
| ;; overrides `file-coding-system-alist'. Not in XEmacs?
|
| ;; ;; default coding system (for new files)
| (GNUEmacs
|     (prefer-coding-system 'utf-8))

You might consider to add -dos, but it's more important that youunderstand that this change will make a lot of your old filesunusable. In UTF-8 only the 7 bit ASCII range is encoded by oneoctet. All 8 bit characters from the ISO Latin encodings are encodingby two octets (or even three, for example the €). Your é is encodedas C3 A9 (€ as the well known E2 82 AC). If GNU Emacs only sees E9(or A4 for €), it will make mistakes! If you switch to UTF-8 youwould need to convert all text files first, or save their oldencodings by adding a header line like this as the first line:


         -*- mode: Text; coding: iso-8859-9; -*-

The mode part is not necessary (could also be tex or latex), butcoding *is*. The other option is 'local variables' in the file's footer:


        %%% Local Variables:
        %%% coding: iso-8859-9
        %%% mode: tex
        %%% End:

and might need to teach GNU Emacs that these local variables are 'safe'.

|
| (GNUEmacs
|     ;; to copy and paste outside Emacs
|     (set-clipboard-coding-system 'iso-latin-9))  ;; aka iso-8859-15

This depends on the windowing system you use. Now, I think, most willuse UTF-8 ...

|
| ;; unify the Latin-N charsets, so that Emacs knows that the é inLatin-9| ;; (with the euro) is the same as the é in Latin-1 (without theeuro)
| ;; [avoid the small accentuated characters]
| (when (try-require 'ucs-tables)
|     (unify-8859-on-encoding-mode 1)  ;; harmless
| (unify-8859-on-decoding-mode 1)) ;; may unexpectedly changefiles if they| ;; contain different Latin-Ncharsets
|                                      ;; which should not be unified


I use these two in GNU Emacs 21.3.50 without the MULE/ucs clause ...

|
| (when window-system
|   ;; functions for dealing with char tables
|   (require 'disp-table))

This might have been useful in GNU Emacs 20 and before. I never usedit, except for european-display or such, maybe. And I also avoid set-language-environment: this is close to obsolete, politely writing.

|
| (XEmacs
|     (require 'iso-syntax))

I'm not really an XEmacs user, but I think this is also somethingfrom the past, 20th century or before.

Could be GNU Emacs 22.0.50 serves you better. Both GNU Emacsen,22.0.50 and 21.3, work better and set up internally better when theyread environment variables like LC_CTYPE, LANG, or LC_ALL thatexplain in which environment they are running. Then you only need tospecify exceptions from this general rule. I have in my environmentLC_CTYPE=de_DE.UTF-8 ...


--
Greetings

  Pete

A morning without coffee is like something without something else.

[Prev in Thread]

Current Thread

[Next in Thread]

Ediff problem with accents, Sébastien Vauban, 2006/09/12
- Re: Ediff problem with accents, Peter Dyballa, 2006/09/12
  - Message not available
    - Re: Ediff problem with accents, Peter Dyballa <=
- Message not available
  - Re: Ediff problem with accents, Sébastien Vauban, 2006/09/22

Prev by Date: Re: Do-you Aspell for French?
Next by Date: Re: Arrowless navigation
Previous by thread: Re: Ediff problem with accents
Next by thread: Re: Ediff problem with accents
Index(es):
- Date
- Thread