[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
charsets and character sets (was: Re: 21.1: list-charset-chars)
From: |
Janusz S. Bień |
Subject: |
charsets and character sets (was: Re: 21.1: list-charset-chars) |
Date: |
19 Feb 2002 11:03:21 +0100 |
User-agent: |
Gnus/5.09 (Gnus v5.9.0) Emacs/21.1 |
On Mon, 18 Feb 2002 "Eli Zaretskii" <eliz@is.elta.co.il> wrote:
> > From: "Ulrich Windl" <Ulrich.Windl@rz.uni-regensburg.de>
> > Date: Mon, 18 Feb 2002 15:58:51 +0100
> >
> > I found out that the result of list-charset-chars (e.g. for latin15) is
> > contrary to the documentation: Only characters > 127 are displayed, but
> > the name and documentation creates the impression that all characters
> > are listed.
>
> What led you to believe that ASCII characters with codes below 128
> belong to the other charsets? Whatever gave you that impression is
> the place where the documentation should be improved, because ASCII
> characters are a separate charset in Emacs.
On Tue, 19 Feb 2002 "Ulrich Windl" <Ulrich.Windl@rz.uni-regensburg.de> wrote:
[...]
> "list charset chars": What else than listing the characters in the
> charset could be expected?
>
> Regards,
> Ulrich
The Emacs documentation fails to make clear distinction between Emacs
charsets and character sets in the sense of ISO and related
standards.
Charset named e.g. latin15 *is not* ISO/IEC Latin 15 character set, it
is just its right-hand part, registered as such in ISO International
Register (available online) as ISO-IR 203. However, iso-8859-15
*coding system* is equivalent to ISO/IEC Latin 15, cf. the output of
`describe-coding-system':
------------------------------------------------------------------------------
0 -- iso-8859-15 (alias of iso-latin-9)
ISO 2022 based 8-bit encoding for Latin-9 (MIME:ISO-8859-15)
Type: 2 (variant of ISO-2022)
Initial designations:
G0 -- ascii:ASCII (ISO646 IRV)
G1 -- latin-iso8859-15:Right-Hand Part of Latin Alphabet 9 (ISO/IEC 8859-15):
ISO-IR-203
-----------------------------------------------------------------------------
Long, long ago I proposed to change the name of charsets
appropriately, but my suggestion was rejected and I didn't pressed the
point. I think there is now the right time to come back to the
problem, as the correct terminology is important for the development
work.
My current proposal is:
- make explicit in the manuals and documentation strings that
charsets are Emacs specific technical terms,
- add `describe-charset' analogical to `describe-coding-system' to
minimize the chance of user confusion,
- on the first convenient occasion rename `latin-15' and related
charsets to something more adequate, e.g. `latin-no9-rp' (15 is the
number of the ISO/IEC 8859 standard part which containes the
definiton of Latin alphabet number 9 while `latin-15' suggests Latin
alphabet number 15; `rp' is to stands for `right-hand part of',
which is ISO/IEC technical term).
Best regards
Janusz
--
,
dr hab. Janusz S. Bien, prof. UW
Prof. Janusz S. Bien, Warsaw Uniwersity
http://www.orient.uw.edu.pl/~jsbien/
---------------------------------------------------------------------
Na tym koncie czytam i wysylam poczte i wiadomosci offline.
On this account I read/post mail/news offline.