|
From: | Elias Mårtenson |
Subject: | Re: Character folding in the pretest |
Date: | Fri, 5 Feb 2016 14:36:13 +0800 |
>> This naturally leads to a possible user option: Having `optical'
>> matches or not, where `optical' means `base character plus
>> diacritic and/or slight modifications', e.g., o → ø → ö etc., etc.
>
> How do you even define "optical similarities"?
Basically the same as Eli has described: Base character plus
diacritics, probably plus some basic shapes with `diacritics' that
Unicode doesn't represent as composable: o → ø, l → ł, d → đ, etc.
> Should l and I compare the same under this definition? They
> certainly looks similar.
No, since the similarity is a font issue only. For this reason I
*never* use Arial-like fonts.
> What about p and q? They look like mirror images of each other.
> What about z and s? They even sound similar.
Nonsense. I've clearly mentioned `base character plus diacritic'.
Why do you intentionally skip that? Doing so reminds me of
Schopenhauer's first stratagem in `The Art of Being Right'...
> To a Swedish speaker there are zero similarities between a, ä and å.
I'm a native German speaker, and there is *zero* similarity in the
sound between `a' and `ä', say.
But it is quite common in English
texts, say, to omit the diaeresis dots, thus having a searching mode
that finds both `Hänsel und Gretel' and `Hansel and Gretel' at the
same time would be very valuable.
What you describe naturally leads to another user option: Don't handle
characters as `equal' (with a proper definition of `equal') that
aren't `equal' in the user's locale.
[Prev in Thread] | Current Thread | [Next in Thread] |