bug#24603: [PATCHv5 00/11] Casing improvements

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#24603: [PATCHv5 00/11] Casing improvements

From:	Eli Zaretskii
Subject:	bug#24603: [PATCHv5 00/11] Casing improvements
Date:	Sat, 11 Mar 2017 12:00:12 +0200

> From: Michal Nazarewicz <mina86@mina86.com>
> Date: Thu,  9 Mar 2017 22:51:39 +0100
> 
> The first six patches (up to sigma casing rule) should be
> uncontroversial and unless there are objections I would like to get
> them submitted soon:
> 
>   Split casify_object into multiple functions
>   Introduce case_character function
>   Add support for title-casing letters  (bug#24603)
>   Split up casify_region function  (bug#24603)
>   Support casing characters which map into multiple code points 
>   Implement special sigma casing rule  (bug#24603)

Fine with me, modulo a few comments I posted to these 6 patches.

> The next patch adds ‘buffer-language’ buffer-local variable.  This
> seems to me as a sensible way of dealing with language-dependent rules
> and in the future I imagine the variable might be used for more
> cases, e.g. spell checking should automatically choose a dictionary
> based on it.  But perhaps there is another way which integrates with
> the rest of Emacs better:
> 
>   Introduce ‘buffer-language’ buffer-local variable

I think we should rather introduce a _function_ named buffer-language,
so that it's easier to extend this mechanism in the future to more
sophisticated and more fine-grained methods of determining the
language, such as text properties and/or overlays with special
properties.  The function could for now just return the value of a
buffer-specific variable, but I wouldn't expose and advertise that
variable so much as your code does.

In addition, your implementation seems to assume that the language
rules are independent of the country where that language is used,
i.e. that nl_NL and nl_BE will necessarily use the same rules for case
conversions.  Is this a good assumption?  Collation rules definitely
do depend on the country as well, AFAIK.

> The rest are just implementation of various language-specific rules.
> The implementation seems to be valid but it’s done purely in C which
> I guess still is a point of contention between me and Eli.

Yes, I'd still prefer that as much of the rules as possible be
specified in Lisp, thus avoiding the need to hard-code Unicode
codepoints and the associated rules in C.  I understand that the
support for each kind of rule should be available in C before the
rules can be used, but once such support is there, having the spec in
Lisp will allow us easier maintenance in the future, easier expansion
of this to cover additional languages that use the same types of
rules, and, with time, perhaps also automatic derivation of the rules
from the Unicode data files, thus providing for easier updates when a
new version of Unicode is incorporated.

So I'd still urge you to try to refactor the code so that as much as
is feasible of the rules is implemented as a Lisp database.  But I
won't reject these patches if you don't want to do such refactoring

Thanks.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#24603: [PATCHv5 08/11] Implement rules for title-casing Dutch ij ‘letter’ (bug#24603), (continued)
- bug#24603: [PATCHv5 09/11] Implement Turkic dotless and dotted i casing rules (bug#24603), Michal Nazarewicz, 2017/03/09
- bug#24603: [PATCHv5 11/11] Implement Irish casing rules (bug#24603), Michal Nazarewicz, 2017/03/09
  - bug#24603: [PATCHv5 11/11] Implement Irish casing rules (bug#24603), Eli Zaretskii, 2017/03/11
    - bug#24603: [PATCHv5 11/11] Implement Irish casing rules (bug#24603), Michal Nazarewicz, 2017/03/16
    - bug#24603: [PATCHv5 11/11] Implement Irish casing rules (bug#24603), Eli Zaretskii, 2017/03/17
- bug#24603: [PATCHv5 05/11] Support casing characters which map into multiple code points (bug#24603), Michal Nazarewicz, 2017/03/09
  - bug#24603: [PATCHv5 05/11] Support casing characters which map into multiple code points (bug#24603), Eli Zaretskii, 2017/03/11
    - bug#24603: [PATCHv5 05/11] Support casing characters which map into multiple code points (bug#24603), Michal Nazarewicz, 2017/03/20
- bug#24603: [PATCHv5 00/11] Casing improvements, Eli Zaretskii <=
- bug#24603: [PATCHv6 0/6] Casing improvements, language-independent part, Michal Nazarewicz, 2017/03/20
  - bug#24603: [PATCHv6 3/6] Add support for title-casing letters (bug#24603), Michal Nazarewicz, 2017/03/20
  - bug#24603: [PATCHv6 1/6] Split casify_object into multiple functions, Michal Nazarewicz, 2017/03/20
  - bug#24603: [PATCHv6 6/6] Implement special sigma casing rule (bug#24603), Michal Nazarewicz, 2017/03/20
  - bug#24603: [PATCHv6 4/6] Split up casify_region function (bug#24603), Michal Nazarewicz, 2017/03/20
  - bug#24603: [PATCHv6 2/6] Introduce case_character function, Michal Nazarewicz, 2017/03/20
  - bug#24603: [PATCHv6 5/6] Support casing characters which map into multiple code points (bug#24603), Michal Nazarewicz, 2017/03/20
    - bug#24603: [PATCHv6 5/6] Support casing characters which map into multiple code points (bug#24603), Eli Zaretskii, 2017/03/22

Prev by Date: bug#26057: 25.1; 'M-x eshell-source-file' fails
Next by Date: bug#25890: `color-values` gives wrong value
Previous by thread: bug#24603: [PATCHv5 05/11] Support casing characters which map into multiple code points (bug#24603)
Next by thread: bug#24603: [PATCHv6 0/6] Casing improvements, language-independent part
Index(es):
- Date
- Thread