bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: faster fnmatch


From: Ondrej Bilka
Subject: Re: faster fnmatch
Date: Sat, 18 Apr 2009 11:01:54 +0200
User-agent: Mutt/1.5.18 (2008-05-17)

On Fri, Apr 17, 2009 at 01:02:57PM +0200, Bruno Haible wrote:
> Hello Ondrej,
> 
> > > Hello. I am writing partial fnmatch to speed up locate et al.
> 
> Cool! We know for some time already that this is a bottleneck [1].
> I find it also interesting that you go for a two-step approach,
> preprocess the pattern once and use it for matching often - the same
> approach that we considered useful in [2].
I finished version with only * and ? support. I attach it.
I looked more into source and discovered fnmatch doesn't work as I imagined.
By default it converts strings into widechars and match there.
 utf8 allows searching be done bitwise. Its in most cases faster.

Is ok just use original fnmatch if pattern contains extended wildcard or [] 
with nonascii symbol?
> Yes, implementing a case-insensitive matching by doing an uppercasing
> pass on both sides is an old technique that ignores the properties of
> a couple of languages. For good support of all languages, it's necessary
> to use a "casefolding" pass, such as the one described by the Unicode
> standard and implemented in gnulib (and soon, libunistring), in
> "unicase.h". In particular the function ulc_u8_casefold from
> lib/unicase/ulc-casecmp.c looks like what's needed as preprocessing pass
> for an arbitrary file name in locale encoding.
Here is casefold patch for fnmatch. (abusing wchar=u32)
Shouldn't be there also added normalization?

--- fnmatch.c.old       2008-10-16 20:59:48.000000000 +0200
+++ fnmatch.c   2009-04-18 09:22:26.109919887 +0200
@@ -51,6 +51,7 @@
 #if defined _LIBC || WIDE_CHAR_SUPPORT
 # include <wctype.h>
 # include <wchar.h>
+# include <unicase.h>
 #endif
 
 /* We need some of the locale data (the collation sequence information)
@@ -177,7 +178,12 @@
 
 
 # if HANDLE_MULTIBYTE
-#  define FOLD(c) ((flags & FNM_CASEFOLD) ? towlower (c) : (c))
+#ifdef _LIBC
+       #define FOLD(c) (c)
+#else
+       #define FOLD(c) ((flags & FNM_CASEFOLD) ? towlower (c) : (c))
+#endif
+
 #  define CHAR wchar_t
 #  define UCHAR        wint_t
 #  define INT  wint_t
@@ -327,10 +333,19 @@
              mbsrtowcs (wpattern, &pattern, patsize, &ps);
              assert (mbsinit (&ps));
              mbsrtowcs (wstring, &string, strsize, &ps);
-
-             res = internal_fnwmatch (wpattern, wstring, wstring + strsize - 1,
+                               wchar_t *wfoldpattern,*wfoldstring;
+                               wfoldpattern=wpattern;wfoldstring=wstring;
+#ifdef _LIBC
+                               if (flags & FNM_CASEFOLD){
+                                       
wfoldpattern=u32_casefold(wpattern,patsize,uc_locale_language (), 
NULL,NULL,&patsize);
+                                       
wfoldstring=u32_casefold(wstring,patsize,uc_locale_language (), 
NULL,NULL,&patsize);
+                               }
+#endif                         
+             res = internal_fnwmatch (wfoldpattern, wfoldstring, wfoldstring + 
strsize - 1,
                                       flags & FNM_PERIOD, flags);
-
+#ifdef _LIBC
+       if (flags & FNM_CASEFOLD){free(wfoldpattern);free(wfoldstring); }
+#endif
              if (__builtin_expect (! (totsize < ALLOCA_LIMIT), 0))
                free (wpattern);
              return res;

Attachment: fnmatch2.c
Description: Text Data

Attachment: fnmatch2.h
Description: Text Data

Attachment: fnmatch2_loop.h
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]