Re: Character folding in the pretest

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Character folding in the pretest

From:	Per Starbäck
Subject:	Re: Character folding in the pretest
Date:	Sat, 6 Feb 2016 10:37:06 +0100

Oscar Fuentes wrote:

> If a Spaniard inputs "sana" on a search box and "saña" is found, he
> will regard the software as either buggy, dumb or completely
> oblivious to Spanish culture.

Similar to my example of how a Swede would see a search for "varpa"
finding "värpa" or "varpå" (all of the three being existing totally
different words).

When met with the "argument" that not many people speak Swedish anyway
I replied that it was only an example of what I knew best, and that
there probably were similar examples in several other languages. I'm
glad to hear there is one in Spanish, one of the largest languages of
the world. Now let's count the number of affected people again! :)

That character folding is dependent on locale is of course well-known
by those who work on this. Artur Malabarba wrote:

> FTR, like I've said a couple of times already, I will invest more time
> into making this customizable once I've seen how it's received.
> Also (and this I haven't said yet) I do plan on providing a better
> default depending on locale. When the time comes to actually implement
> it I'll explain why I prefer locale (over some notion of buffer-local
> language).

When Artur again confirmed that he is fine with having the new feature
turned of in Emacs 25 with the intention of having it turned on later,
after it has had enough testing, I though this would finally be settled.

But evidently not yet... From the opposers it has been argued as if
this is something mandated by Unicode, so we can do nothing about it
but to follow. It doesn't matter if the result is seen as buggy or
dumb by users. "This feature is simply folding as specified by the
Unicode standard".

That is not so. Of course the Unicode Consortium is well aware of the
issues that I, Oscar and others are pointing out, and that I'm sure
Artur is well aware of.

Eli Zaretskii:
> Perhaps you aren't familiar with Unicode equivalence, in which case I
> suggest these sources:
>
>   http://unicode.org/reports/tr10/#Searching
>   http://www.unicode.org/notes/tn5/
>   http://www.unicode.org/reports/tr30/tr30-4.html

But of course these take up issues like we have mentioned here. The
first one mentions the aa/å equivalence in Danish for example. And to
quote the last one:

#  In the general case, different search term foldings are applied for
#  different languages. For example, accent distinctions are ignorable
#  for some languages, but not for others. In English the accent in
#  words like naïve is optional, while to a Swedish user 'o' and 'ö'
#  are distinct letters.

That is by the way the last draft of a withdrawn tecnical report.

  Draft UTR #30: Unicode Character Foldings has been withdrawn. It was
  never formally approved; the last public version was a draft
  UTR,which can be found at
  http://www.unicode.org/reports/tr30/tr30-4.html.

That shows not only that the issues I, Oscar and others are mentioning
are not something new that we just thought of that Unicode somehow
should have us ignore. It also shows that there *is* no technical
report on Unicode Character Foldings.

We have to break out of the circles this is going in. John Wiegley wrote:

> A locale-based quotient for natural language text seems like a reasonable
> default, unless pretesting/polling shows us otherwise. However, there will
> always be times when you don't want it, or you want a different quotient
> altogether, or even various combinations of them.

Yes, that would be a good default, but that's not a default that we
can have in the next Emacs, but that there is great prospects we can
have in the one after that. Please John, put your foot down and don't
let this continue ad infinitum.

The options we have are instead:

(1) Let the default be as searching has worked before. Nothing gets
worse for anyone.

We'll the start of a new exciting feature available, that will be just
right for many users, and that will be tried by a lot others as well,
giving feedback for the continued development that Artur has written
that he already is planning.

(2) Make the fundamental feature searching work fundamentally
different out of the box in a way that for many users will be seen as
neat, and for many users will be seen as "buggy, dumb or completely
oblivious to" the user's culture.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Character folding in the pretest, (continued)

Prev by Date: Re: [Emacs-diffs] emacs-25 504696d: Etags: yet another improvement in Ruby tags
Next by Date: Re: Character folding in the pretest
Previous by thread: Re: Character folding in the pretest
Next by thread: Re: Character folding in the pretest
Index(es):
- Date
- Thread