bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gnulib's Knuth-Morris-Pratt implementation


From: Eric Blake
Subject: Re: gnulib's Knuth-Morris-Pratt implementation
Date: Fri, 4 Jan 2008 23:56:26 +0000 (UTC)
User-agent: Loom/3.14 (http://gmane.org/)

Bruno Haible <bruno <at> clisp.org> writes:

> 
> But this argument also makes it unimportant to use Boyer-Moore over KMP:
> Once you've spent 5*n cycles on the simple algorithm, you don't care whether
> the computation will be finished in additional 2*n cycles or 2*n/m cycles.
> 
> > The main problem
> > though is that the B-M algorithm (I guess we'd likely select
> > Turbo-Boyer-Moore to reduce the comparisons from 3N to 2N) needs to
> > move backward through the haystack.  That doesn't work well on
> > multibyte strings.
> 
> Yes, that's the reason why I chose KMP in the first place.

After more browsing for string algorithms, I think the Two-Way algorithm may be 
the best fit for memmem (and strstr, if we decide to also replace that on 
systems with quadratic strstr):
 http://www-igm.univ-mlv.fr/~lecroq/string/node26.html#SECTION00260

It has the nice property of O(1) preprocessing and O(N) comparisons (strictly 
bounded at 2N-M in the worst case), at the expense of random access to the data 
(therefore not appropriate for mbsstr) and an ordered alphabet (so I'm not sure 
if it can be used for strcase), but may actually be an algorithm that will be 
acceptable to the glibc people as it does not require malloc.  I'll play with 
implementing it for memmem, and see how it goes.

-- 
Eric Blake







reply via email to

[Prev in Thread] Current Thread [Next in Thread]