Re: Case mapping of sharp s

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Case mapping of sharp s

From:	Stephen J. Turnbull
Subject:	Re: Case mapping of sharp s
Date:	Wed, 18 Nov 2009 14:33:57 +0900

Eli Zaretskii writes:

 > My idea was to use the same technique to fold the text as you search
 > through it.  You could, for example, fold one character at a time, so
 > that instead of comparing against X you compare against fold(X).

That's precisely how case-insensitive Boyer-Moore is done.  But it's
not that easy for Unicode, because you will need to check
*characters*, not octets, while Boyer-Moore is very much oriented
toward fixed-width charsets.

I think the problem here is that because of the nature of Unicode,
this folding often crosses character blocks (here meaning "sets of
characters with the same bits except for the lowest 6").  That means
that you will need to keep track not only of the folding for the
current octet, but you will also need to be able to backtrack to check
the higher bits.  By contrast, with the Mule internal encoding the
case pairs are always in the same block of 96 characters.

I don't think that it's *impossible* to implement Boyer-Moore with
translation for UTF-8, but it's nowhere near as easy as for the
unibyte charset case, which Mule internal can in practice be reduced
to.  It's possible that all this fiddling could result in a
performance hit of the same order of magnitude as the performance hit
of brute-forcing the comparison.

[Prev in Thread]

Current Thread

[Next in Thread]

Case mapping of sharp s, Ulrich Mueller, 2009/11/15
- Re: Case mapping of sharp s, Kenichi Handa, 2009/11/16
  - Re: Case mapping of sharp s, Ulrich Mueller, 2009/11/16
    - Re: Case mapping of sharp s, Kenichi Handa, 2009/11/17
    - Re: Case mapping of sharp s, Reiner Steib, 2009/11/17
  - Re: Case mapping of sharp s, Eli Zaretskii, 2009/11/16
    - Re: Case mapping of sharp s, Kenichi Handa, 2009/11/17
    - Re: Case mapping of sharp s, Eli Zaretskii, 2009/11/17
    - Re: Case mapping of sharp s, Kenichi Handa, 2009/11/17
    - Re: Case mapping of sharp s, Eli Zaretskii, 2009/11/17
    - Re: Case mapping of sharp s, Stephen J. Turnbull <=
    - Re: Case mapping of sharp s, Kenichi Handa, 2009/11/18
    - Re: Case mapping of sharp s, Stefan Monnier, 2009/11/18
    - Re: Case mapping of sharp s, Ulrich Mueller, 2009/11/18
    - Re: Case mapping of sharp s, Stefan Monnier, 2009/11/18
    - Re: Case mapping of sharp s, Eli Zaretskii, 2009/11/18
    - Re: Case mapping of sharp s, Stephen J. Turnbull, 2009/11/18
    - Re: Case mapping of sharp s, martin rudalics, 2009/11/17
- Re: Case mapping of sharp s, grischka, 2009/11/19
  - Re: Case mapping of sharp s, Stefan Monnier, 2009/11/19
    - Re: Case mapping of sharp s, David Kastrup, 2009/11/19

Prev by Date: Re: Case mapping of sharp s
Next by Date: Re: Case mapping of sharp s
Previous by thread: Re: Case mapping of sharp s
Next by thread: Re: Case mapping of sharp s
Index(es):
- Date
- Thread