bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: charsets and character sets (was: Re: 21.1: list-charset-chars)


From: Janusz S. Bień
Subject: Re: charsets and character sets (was: Re: 21.1: list-charset-chars)
Date: 19 Feb 2002 19:42:36 +0100
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.1

I quote my letter in full as I intended to send it also to emacs-devel
but forgot to add it to the adressee list.

On 19 Feb 2002  jsbien@mimuw.edu.pl (Janusz S. Bień) wrote:

> On Mon, 18 Feb 2002  "Eli Zaretskii" <eliz@is.elta.co.il> wrote:
> 
> > > From: "Ulrich Windl" <Ulrich.Windl@rz.uni-regensburg.de>
> > > Date: Mon, 18 Feb 2002 15:58:51 +0100
> > > 
> > > I found out that the result of list-charset-chars (e.g. for latin15) is 
> > > contrary to the documentation: Only characters > 127 are displayed, but 
> > > the name and documentation creates the impression that all characters 
> > > are listed.
> > 
> > What led you to believe that ASCII characters with codes below 128
> > belong to the other charsets?  Whatever gave you that impression is
> > the place where the documentation should be improved, because ASCII
> > characters are a separate charset in Emacs.
> 
> On Tue, 19 Feb 2002  "Ulrich Windl" <Ulrich.Windl@rz.uni-regensburg.de> wrote:
> 
> [...]
> 
> > "list charset chars": What else than listing the characters in the 
> > charset could be expected?
> > 
> > Regards,
> > Ulrich
> 
> The Emacs documentation fails to make clear distinction between Emacs
> charsets and character sets in the sense of ISO and related
> standards. 
> 
> Charset named e.g. latin15 *is not* ISO/IEC Latin 15 character set, it
> is just its right-hand part, registered as such in ISO International
> Register (available online) as ISO-IR 203. However, iso-8859-15
> *coding system* is equivalent to ISO/IEC Latin 15, cf. the output of
> `describe-coding-system':
> 
> ------------------------------------------------------------------------------
> 0 -- iso-8859-15 (alias of iso-latin-9)
>   ISO 2022 based 8-bit encoding for Latin-9 (MIME:ISO-8859-15)
> Type: 2 (variant of ISO-2022)
> Initial designations:
>   G0 -- ascii:ASCII (ISO646 IRV)
>   G1 -- latin-iso8859-15:Right-Hand Part of Latin Alphabet 9 (ISO/IEC 
> 8859-15): ISO-IR-203
> -----------------------------------------------------------------------------
> 
> Long, long ago I proposed to change the name of charsets
> appropriately, but my suggestion was rejected and I didn't pressed the
> point. I think there is now the right time to come back to the
> problem, as the correct terminology is important for the development
> work.
> 
> My current proposal is:
> 
> -  make explicit in the manuals and documentation strings that
>   charsets are Emacs specific technical terms,
> 
> - add `describe-charset' analogical to `describe-coding-system' to
>   minimize the chance of user confusion,
> 
> - on the first convenient occasion rename `latin-15' and related
>   charsets to something more adequate, e.g. `latin-no9-rp' (15 is the
>   number of the ISO/IEC 8859 standard part which containes the
>   definiton of Latin alphabet number 9 while `latin-15' suggests Latin
>   alphabet number 15; `rp' is to stands for `right-hand part of',
>   which is ISO/IEC technical term).
> 
> Best regards
> 
> Janusz
> 
> -- 
>                      ,   
> dr hab. Janusz S. Bien, prof. UW
> Prof. Janusz S. Bien, Warsaw Uniwersity
> http://www.orient.uw.edu.pl/~jsbien/
> ---------------------------------------------------------------------
> Na tym koncie czytam i wysylam poczte i wiadomosci offline.
> On this account I read/post mail/news offline.

On Tue, 19 Feb 2002  "Eli Zaretskii" <eliz@is.elta.co.il> wrote:


[...]

> > I don't have a v21 Emacs at hand in the moment, but a ISO 8859 15 
> > charset is a superset of US-ASCII
> 
> Not in Emacs, it isn't.  

Because charset *is not* character set.

> The full name of latin-iso8859-15 in Emacs
> is this:
> 
>   "Right-Hand Part of Latin Alphabet 9 (ISO/IEC 8859-15): ISO-IR-203."
> 
> See mule-conf.el for more information.  The ``right-hand part'' thing
> means that characters below 128 are not included.

In other words, the charset name is not adequate.

> What I'm asking is where would you suggest to explain this
> fundamental fact so that it becomes clear.

For example, after

-------------------------------------------------------------------------
International Character Set Support
***********************************

   Emacs supports a wide variety of international character sets,
including European variants of the Latin alphabet, as well as Chinese,
Cyrillic, Devanagari (Hindi and Marathi), Ethiopic, Greek, Hebrew, IPA,
Japanese, Korean, Lao, Thai, Tibetan, and Vietnamese scripts.  These
features have been merged from the modified version of Emacs known as
MULE (for "MULti-lingual Enhancement to GNU Emacs")
------------------------------------------------------------------------

add

        To implement the character set support Emacs uses the notion
        of charset. For historical reasons most 8-bit character codes
        are considered to consist of two separate 7-bit charsets,
        namely ASCII and so called right-hand part of the appropriate
        character code, for example...

        Please note also that characters belonging to different
        charsets are always different, even if they look the same: the
        letter o with acute accent from Latin alphabet no 1 (charset
        `latin-no1-rp', intended to be used e.g. for French) is
        different from the letter o with acute accent from Latin
        alphabet no 2 (charset `latin-no2-rp', intended to be used
        e.g. for Polish).

Best regards

Janusz

-- 
                     ,   
dr hab. Janusz S. Bien, prof. UW
Prof. Janusz S. Bien, Warsaw Uniwersity
http://www.orient.uw.edu.pl/~jsbien/
---------------------------------------------------------------------
Na tym koncie czytam i wysylam poczte i wiadomosci offline.
On this account I read/post mail/news offline.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]