[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: command fill-paragraph deletes leading Umlauts if line begins with s
From: |
Ralf Angeli |
Subject: |
Re: command fill-paragraph deletes leading Umlauts if line begins with space |
Date: |
Thu, 23 Dec 2004 11:19:11 +0100 |
User-agent: |
Gnus/5.110003 (No Gnus v0.3) Emacs/21.3.50 (gnu/linux) |
* Ulrich Scholz (2004-12-22) writes:
> value of $LANG: en_US.ISO-8859-15
> locale-coding-system: nil
> default-enable-multibyte-characters: nil
>
> Please describe exactly what actions triggered the bug
> and the precise symptoms of the bug:
>
> The command changes the following paragraph
>
> �bersetzung L�sungsverfahren f�r eine spezielle Problemdom�ne haben auch
> Probleme:
>
> to the paragraph
>
> bersetzung L�sungsverfahren f�r eine spezielle Problemdom�ne haben
> auch Probleme:
>
> Note that the � of �bersetzung is missing in the second version. The
> bug eats any number of Umlauts, but only as first characters of the line after
> some spaces. Umlauts after the first non-Umlaut or in lines that begin with a
> non-space remain.
>
> I don't know how to get a list of all active modes. The bug occurs while
> editing an LaTeX-file. I use auc-tex and reftex. iso-accents-mode does not
> seem to cause the bug.
I can reproduce the behavior with CVS AUCTeX, but only if I force
Emacs (21.3 or CVS) to open the file in unibyte mode by using
`find-file-literally'. The problem is that with unibyte mode umlauts
are considered to have whitespace syntax. For example, typing `C-u
C-x =' on the first umlaut in your example gives
character: � (0334, 220, 0xdc)
charset: eight-bit-graphic (8-bit graphic char (0xA0..0xFF))
code point: 220
syntax: which means: whitespace
buffer code: 0xDC
file code: 0xDC (encoded by coding system no-conversion)
display: by display table entry [?�] (see below)
(Instead of the control char one actually sees a "Ü".)
A function in AUCTeX for doing indentation looks at whitespace syntax
for finding the first non-whitespace character (and so does
`back-to-indentation' in CVS Emacs). That means it will skip the "Ü"
and delete everything from the beginning of the line to and including
the "Ü".
I removed this code in CVS AUCTeX which now only uses
`back-to-indentation'. In Emacs 21.3 this function does not look at
character syntax but simply skips spaces and tab characters at the
beginning of a line. So unless you are using CVS Emacs (i.e. the
upcoming Emacs 21.4) your umlauts should be safe.
Anyway, do you really need the unibyte stuff? If you want to use
latin-1, latin-9 and other non-ASCII encodings it will be better to
use Emacs in multibyte mode. That means you should get rid of a
--unibyte command line option, a nil value for
`default-enable-multibyte-characters' or stuff like
`(standard-display-european t)'. For example, this will make `M-f'
work correctly, i.e. it will not stop at every umlaut.
--
Ralf