[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a
From: |
Eric Blake |
Subject: |
Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file |
Date: |
Fri, 06 Nov 2009 06:56:37 -0700 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.23) Gecko/20090812 Thunderbird/2.0.0.23 Mnenhy/0.7.6.666 |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
According to Corinna Vinschen on 11/6/2009 6:51 AM:
>> The problem *is* with grep (and sed), however, because there is no
>> good reason that UTF-8 should give us a penalty of being 100times
>> slower on most search operations, this is just poor programming of
>> grep and sed.
>
> The penalty on Linux is much smaller, about 15-20%. It looks like
> grep is calling malloc for every input line if MB_CUR_MAX is > 1.
> Then it evaluates for each byte in the line whether the byte is a
> single byte or the start of a multibyte sequence using mbrtowc on
> every charatcer on the input line. Then, for each potential match,
> it checks if it's the start byte of a multibyte sequence and ignores
> all other matches. Eventually, it calls free, and the game starts
> over for the next line.
Adding bug-grep, since this slowdown caused by additional mallocs is
definitely the sign of a poor algorithm that could be improved by reusing
existing buffers.
- --
Don't work too hard, make some time for fun as well!
Eric Blake address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAkr0KxUACgkQ84KuGfSFAYCOCACgvjz2v65vK8DIcGg6zfnLQgcT
tfQAmwbpWbriBJSv0rjYobYgsh4KXOiZ
=B3nZ
-----END PGP SIGNATURE-----
- Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file,
Eric Blake <=