bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in em


From: Stefan Monnier
Subject: bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
Date: Fri, 04 Feb 2011 16:34:44 -0500
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (gnu/linux)

> I think I'm starting to understand what is going on.

I think you're still confused, tho.

For some reason, you haven't replied to any of my email, even tho it's
blatantly obvious that your "default enable-multibyte-characters" is the
main culprit (and the "default" part is important here, it means that
it comes from something you've explicitly changed in your Emacs config).

> I had created a long time ago an unibyte file containing the 1-byte
> characters I want to test within emacs.

You mean an iso-8859-1 file, then.  A unibyte file only contains bytes,
no chars.

> I started /usr/local/bin/emacs -Q mytestchars-224-255-iso-8859.txt
> under emacs  23.2.93.1 (i686-pc-linux-gnu)

> The file displays perfectly correctly. (describe-char (point)) gives me
> exactly what I want, i.e. an extended asci decimal code between 224 and 255.

The code is not very helpful here, since depending on whether the
current buffer is unibyte or multibyte, the 224 or 255 doesn't mean the
same thing.  So the second line "preferred charset:" is more important,
since it should either say "eight-bit" (i.e. a raw byte with no
associated meaning of it representing some kind of character) or
"iso-8859-1".

> Almost all operations (except capitalize, see below) work exactly as I wish

So that leads me to think the buffer is in unibyte mode.
If you started with "emacs -Q", the only explanation is that you have
EMACS_UNIBYTE set in your environment variables.  If that's the case,
then please get rid of it.

> At the beginning of this discussion, Sven explained that capitalize would
> only work on 2-byte characters.

unibyte-vs-multibyte is not the same as "1-byte char"-vs-"2-byte chars".
It's an issue that's internal to Emacs and that's largely irrelevant to
how Emacs stores chars (e.g. as an array of 32bit integers, or as
a sequence of bytes, with escape sequences to represent more than 256
different values).

> Which I tested of course, and of course it works, but I simply wish
> I could continue to capitalize M-c  unibyte words like  in the good
> old iso-8859 days !!

Most likely you won't tell the difference: the multibyte mode works just
as well for iso-8859 files.  "multibyte-mode" means "we're manipulating
chars", whereas unibyte mode means "we're manipulating bytes", where
bytes are simply numbers between 0 and 255.  Now you tell me: what does
it mean to capitalize the number 224?


        Stefan





reply via email to

[Prev in Thread] Current Thread [Next in Thread]