emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Character folding in the pretest


From: Werner LEMBERG
Subject: Re: Character folding in the pretest
Date: Fri, 05 Feb 2016 07:01:03 +0100 (CET)

>> This naturally leads to a possible user option: Having `optical'
>> matches or not, where `optical' means `base character plus
>> diacritic and/or slight modifications', e.g., o → ø → ö etc., etc.
> 
> How do you even define "optical similarities"?

Basically the same as Eli has described: Base character plus
diacritics, probably plus some basic shapes with `diacritics' that
Unicode doesn't represent as composable: o → ø, l → ł, d → đ, etc.

> Should l and I compare the same under this definition?  They
> certainly looks similar.

No, since the similarity is a font issue only.  For this reason I
*never* use Arial-like fonts.

> What about p and q?  They look like mirror images of each other.
> What about z and s?  They even sound similar.

Nonsense.  I've clearly mentioned `base character plus diacritic'.
Why do you intentionally skip that?  Doing so reminds me of
Schopenhauer's first stratagem in `The Art of Being Right'...

> To a Swedish speaker there are zero similarities between a, ä and å.

I'm a native German speaker, and there is *zero* similarity in the
sound between `a' and `ä', say.  But it is quite common in English
texts, say, to omit the diaeresis dots, thus having a searching mode
that finds both `Hänsel und Gretel' and `Hansel and Gretel' at the
same time would be very valuable.

> My personal preference is that the expected behaviour of searches is
> more related to the locale of the user, rather than that of the
> document being searched.  In other words, as a non-Spanish speaker,
> I'd expect to be able to find ñ when searching for n, even if the
> document I'm searching in is in Spanish.  There are definitely an
> infinite number of counter-examples to this (enough to keep this
> thread going for another 100 messages, I'm sure), but at least there
> is reason to consider making the default based on the locale of the
> user.

What you describe naturally leads to another user option: Don't handle
characters as `equal' (with a proper definition of `equal') that
aren't `equal' in the user's locale.


    Werner

reply via email to

[Prev in Thread] Current Thread [Next in Thread]