bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#24603: [PATCHv6 5/6] Support casing characters which map into multip


From: Eli Zaretskii
Subject: bug#24603: [PATCHv6 5/6] Support casing characters which map into multiple code points (bug#24603)
Date: Wed, 22 Mar 2017 18:06:47 +0200

> From: Michal Nazarewicz <mina86@mina86.com>
> Date: Tue, 21 Mar 2017 02:27:08 +0100
> 
> Implement unconditional special casing rules defined in Unicode standard.

Thanks.  A few comments below.

> diff --git a/admin/unidata/unidata-gen.el b/admin/unidata/unidata-gen.el
> index 3c5119a8a3d..32b05eacce6 100644
> --- a/admin/unidata/unidata-gen.el
> +++ b/admin/unidata/unidata-gen.el
> @@ -268,6 +268,33 @@ unidata-prop-alist
>  The value nil means that the actual property value of a character
>  is the character itself."
>       string)
> +    (special-uppercase
> +     2 unidata-gen-table-special-casing "uni-special-uppercase.el"
> +     "Unicode unconditional special casing mapping.
> +
> +Property value is nil, denoting no special rules, or a string, denoting
> +characters maps into given sequence of characters.

Something is wrong with the last sentence.  (This problem repeats in
other similar sentences in the patch.)

> +The mapping includes only unconditional casing rules defined by Unicode."

This begs for clarification: what is meant by "unconditional casing"?
I think a sentence or two of explanation are due.

> +@item special-uppercase
> +Corresponds to Unicode unconditional special upper-casing rules.  The value

Likewise here: the "unconditional" part should be explained.

> +is @code{"SS"}.  For unassigned codepoints, the value is @code{nil}
> +which means @code{uppercase} property needs to be consulted instead.

When you say "unassigned codepoints", do you mean codepoints that
don't have characters defined for them in Unicode?  Because that's the
usual meaning of this term in the context of Unicode.  If you mean
something else, please use some other term.  (I think you mean
something else, since properties of unassigned codepoints are not
really interesting for Lisp programmers.)

> +mapping for @code{U+0130} (@sc{latin capital letter i with dot above})
> +the value is @code{"i\u0307"}.  For unassigned codepoints, the value is

Instead of using "i\u0307", in the hope that the reader will
understand it's a string made of 2 characters, I would say that
explicitly.

>  DEFUN ("upcase", Fupcase, Supcase, 1, 1, 0,
>         doc: /* Convert argument to upper case and return that.
>  The argument may be a character or string.  The result has the same type.
> -The argument object is not altered--the value is a copy.
> +The argument object is not altered--the value is a copy.  If argument
> +is a character, characters which map to multiple code points when
> +cased, e.g. fi, are returned unchanged.
>  See also `capitalize', `downcase' and `upcase-initials'.  */)

Using non-ASCII characters here requires adding a 'coding' cookie to
the file's first line.  (C sources are not by default decoded as
UTF-8, unlike Lisp files.)





reply via email to

[Prev in Thread] Current Thread [Next in Thread]