bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

charsets and character sets (was: Re: 21.1: list-charset-chars)


From: Janusz S. Bień
Subject: charsets and character sets (was: Re: 21.1: list-charset-chars)
Date: 19 Feb 2002 11:03:21 +0100
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.1

On Mon, 18 Feb 2002  "Eli Zaretskii" <eliz@is.elta.co.il> wrote:

> > From: "Ulrich Windl" <Ulrich.Windl@rz.uni-regensburg.de>
> > Date: Mon, 18 Feb 2002 15:58:51 +0100
> > 
> > I found out that the result of list-charset-chars (e.g. for latin15) is 
> > contrary to the documentation: Only characters > 127 are displayed, but 
> > the name and documentation creates the impression that all characters 
> > are listed.
> 
> What led you to believe that ASCII characters with codes below 128
> belong to the other charsets?  Whatever gave you that impression is
> the place where the documentation should be improved, because ASCII
> characters are a separate charset in Emacs.

On Tue, 19 Feb 2002  "Ulrich Windl" <Ulrich.Windl@rz.uni-regensburg.de> wrote:

[...]

> "list charset chars": What else than listing the characters in the 
> charset could be expected?
> 
> Regards,
> Ulrich

The Emacs documentation fails to make clear distinction between Emacs
charsets and character sets in the sense of ISO and related
standards. 

Charset named e.g. latin15 *is not* ISO/IEC Latin 15 character set, it
is just its right-hand part, registered as such in ISO International
Register (available online) as ISO-IR 203. However, iso-8859-15
*coding system* is equivalent to ISO/IEC Latin 15, cf. the output of
`describe-coding-system':

------------------------------------------------------------------------------
0 -- iso-8859-15 (alias of iso-latin-9)
  ISO 2022 based 8-bit encoding for Latin-9 (MIME:ISO-8859-15)
Type: 2 (variant of ISO-2022)
Initial designations:
  G0 -- ascii:ASCII (ISO646 IRV)
  G1 -- latin-iso8859-15:Right-Hand Part of Latin Alphabet 9 (ISO/IEC 8859-15): 
ISO-IR-203
-----------------------------------------------------------------------------

Long, long ago I proposed to change the name of charsets
appropriately, but my suggestion was rejected and I didn't pressed the
point. I think there is now the right time to come back to the
problem, as the correct terminology is important for the development
work.

My current proposal is:

-  make explicit in the manuals and documentation strings that
  charsets are Emacs specific technical terms,

- add `describe-charset' analogical to `describe-coding-system' to
  minimize the chance of user confusion,

- on the first convenient occasion rename `latin-15' and related
  charsets to something more adequate, e.g. `latin-no9-rp' (15 is the
  number of the ISO/IEC 8859 standard part which containes the
  definiton of Latin alphabet number 9 while `latin-15' suggests Latin
  alphabet number 15; `rp' is to stands for `right-hand part of',
  which is ISO/IEC technical term).

Best regards

Janusz

-- 
                     ,   
dr hab. Janusz S. Bien, prof. UW
Prof. Janusz S. Bien, Warsaw Uniwersity
http://www.orient.uw.edu.pl/~jsbien/
---------------------------------------------------------------------
Na tym koncie czytam i wysylam poczte i wiadomosci offline.
On this account I read/post mail/news offline.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]