bug#24603: [RFC 15/18] Base lower- and upper-case tests on Unicode prope

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#24603: [RFC 15/18] Base lower- and upper-case tests on Unicode prope

From:	Eli Zaretskii
Subject:	bug#24603: [RFC 15/18] Base lower- and upper-case tests on Unicode properties
Date:	Tue, 04 Oct 2016 09:54:30 +0300

> From: Michal Nazarewicz <mina86@mina86.com>
> Date: Tue,  4 Oct 2016 03:10:38 +0200
> 
> +** 'upper' and 'lower' character classes are unaffected by case table
> +since they are now based purely on Unicode properties.

This is actually a backward-incompatible change, isn't it?  If so, it
should be in the corresponding section of NEWS.  More importantly,
there should be a way to get back the old behavior, i.e. to force
'upper' and 'lower' use the current case tables.

Better yet, can we use the Unicode properties only where case tables
are insufficient, like in the case of ligatures being broken up into
individual characters by case conversions?  That'd be
backward-compatible, so won't risk breaking existing code.

I'm also okay with a defcustom, by default off, to prefer the Unicode
data, as you did, so that we could in the future make this the default
behavior.  But doing this right now without any transition period and
no way of going back is too radical, I think.

Please also note that Unicode tables are global, very large, and in
many cases tricky to change from Lisp (as compared to simple
char-tables).  So customizing the case conversions that are based
solely on the Unicode tables is much harder and/or has global
implications, unlike the case tables.  With that in mind, I think we
should make the transition smoother, and we should probably add
convenience APIs for customizing the case conversions the Unicode way,
before we switch to that as the default.

Thanks.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#24603: [RFC 00/18] Improvement to casing, Michal Nazarewicz, 2016/10/03
- bug#24603: [RFC 01/18] Add tests for casefiddle.c, Michal Nazarewicz, 2016/10/03
  - bug#24603: [RFC 05/18] Introduce case_character function, Michal Nazarewicz, 2016/10/03
  - bug#24603: [RFC 06/18] Add support for title-casing letters, Michal Nazarewicz, 2016/10/03
  - bug#24603: [RFC 13/18] Add some tricky Unicode characters to regex test, Michal Nazarewicz, 2016/10/03
  - bug#24603: [RFC 15/18] Base lower- and upper-case tests on Unicode properties, Michal Nazarewicz, 2016/10/03
    - bug#24603: [RFC 15/18] Base lower- and upper-case tests on Unicode properties, Eli Zaretskii <=
  - bug#24603: [RFC 04/18] Split casify_object into multiple functions, Michal Nazarewicz, 2016/10/03
  - bug#24603: [RFC 03/18] Don’t assume character can be either upper- or lower-case when casing, Michal Nazarewicz, 2016/10/03
  - bug#24603: [RFC 12/18] Implement rules for title-casing Dutch ij ‘letter’, Michal Nazarewicz, 2016/10/03
  - bug#24603: [RFC 11/18] Implement casing rules for Lithuanian, Michal Nazarewicz, 2016/10/03
  - bug#24603: [RFC 16/18] Refactor character class checking; optimise ASCII case, Michal Nazarewicz, 2016/10/03
    - bug#24603: [RFC 16/18] Refactor character class checking; optimise ASCII case, Eli Zaretskii, 2016/10/04
    - bug#24603: [RFC 16/18] Refactor character class checking; optimise ASCII case, Michal Nazarewicz, 2016/10/17
  - bug#24603: [RFC 09/18] Implement special sigma casing rule, Michal Nazarewicz, 2016/10/03
    - bug#24603: [RFC 09/18] Implement special sigma casing rule, Eli Zaretskii, 2016/10/04
  - bug#24603: [RFC 14/18] Factor out character category lookup to separate function, Michal Nazarewicz, 2016/10/03

Prev by Date: bug#24500: 25.1.50; Can't other-window from minibuffer if Ediff control panel frame present
Next by Date: bug#24603: [RFC 10/18] Implement Turkic dotless and dotted i handling when casing strings
Previous by thread: bug#24603: [RFC 15/18] Base lower- and upper-case tests on Unicode properties
Next by thread: bug#24603: [RFC 04/18] Split casify_object into multiple functions
Index(es):
- Date
- Thread