[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: faster fnmatch
From: |
Bruno Haible |
Subject: |
Re: faster fnmatch |
Date: |
Fri, 17 Apr 2009 13:02:57 +0200 |
User-agent: |
KMail/1.9.9 |
Hello Ondrej,
> > Hello. I am writing partial fnmatch to speed up locate et al.
Cool! We know for some time already that this is a bottleneck [1].
I find it also interesting that you go for a two-step approach,
preprocess the pattern once and use it for matching often - the same
approach that we considered useful in [2].
> > FOLD causes worst performance slowdown.
> > From what I tried is best convert in buffer to uppercase when needed.
>
> This seems like an attractive option but I'm a little concerned about
> whether this will produce the correct result with characters like ß or
> Ï and ï.
> ... Bruno has implemented a
> large amount of Unicode support in gnulib, and this may in fact help
Yes, implementing a case-insensitive matching by doing an uppercasing
pass on both sides is an old technique that ignores the properties of
a couple of languages. For good support of all languages, it's necessary
to use a "casefolding" pass, such as the one described by the Unicode
standard and implemented in gnulib (and soon, libunistring), in
"unicase.h". In particular the function ulc_u8_casefold from
lib/unicase/ulc-casecmp.c looks like what's needed as preprocessing pass
for an arbitrary file name in locale encoding.
Note that this Unicode aware casefolding provides correctness; it will,
however, likely decrease the speed of the matching in 'find' (whereas in
'locate', the file name list can be processed ahead of time).
Bruno
[1] http://lists.gnu.org/archive/html/bug-gnulib/2007-12/msg00040.html
[2] http://lists.gnu.org/archive/html/bug-gnulib/2007-12/msg00071.html