bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug libc/5514] memmem is O(n^2), but should be O(n)


From: Eric Blake
Subject: Re: [Bug libc/5514] memmem is O(n^2), but should be O(n)
Date: Thu, 03 Jan 2008 22:14:20 -0700
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071031 Thunderbird/2.0.0.9 Mnenhy/0.7.5.666

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to jakub at redhat dot com on 12/21/2007 7:00 AM:
> ------- Additional Comments From jakub at redhat dot com  2007-12-21 14:00 
> -------
>> Can you explain why?
> 
> Because the malloc call makes the function no longer async-signal safe.
> While POSIX says nothing about memmem, it is quite desirable to let
> the basic mem* and str*/stp* memory/string ops are async-signal safe (of 
> course
> strdup is an exception here).

Jakub raises an interesting point here.  While memmem is not standardized,
it is similar to strstr, yet POSIX makes no requirements that any of the
str* functions be async-signal safe.  Is this worth raising with the
Austin Group for clarification in the next version of POSIX?

Meanwhile, this means that it is likely that glibc's strstr and memmem
will NEVER be fixed to be O(n) (the only way to avoid the O(n^2) worst
case is to allocate O(n) memory, trading space for time).  Perhaps it is
worth splitting the memmem module into two - memmem, which provides memmem
on systems that lack it or that have non-performance bugs, but can live
with glibc's (and other platforms') deficient performance in the worst
case, and used by programs that control their use of memmem such that the
needle never exceeds the threshold that triggers the O(n^2) behavior; and
memmem-fast (or perhaps invent a different interface name, to distinguish
the performance improvement at the expense of async-safety) that
guarantees the O(n) performance but documents that it is no longer async-safe.

And another thought: while mbsstr favors KMP because it is much easier to
traverse multibyte characters in forward order, memmem would probably be
better off with a variant like turbo-Boyer-Moore; both algorithms have
O(n) worst case, but where searching backwards is not prohibitive
(remember, memmem searches bytes, not multibyte characters), TBM has the
nice property of superlinear speed in the common case.

- --
Don't work too hard, make some time for fun as well!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHfcCr84KuGfSFAYARAgDTAJ41KyE3wZoyimaZwwnD/94VqKHyRACeM5MT
XnvaoE2h3YRCBAv1guoSeO8=
=Pt6z
-----END PGP SIGNATURE-----




reply via email to

[Prev in Thread] Current Thread [Next in Thread]