Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a

From:	Thomas Wolff
Subject:	Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file
Date:	Fri, 06 Nov 2009 17:39:46 +0100
User-agent:	Thunderbird 2.0.0.23 (Windows/20090812)

[forgot to CC to bug-grep before, so I'm resending this, with one morecomment, and leaving out cygwin-specific parts]


Corinna Vinschen wrote:

On Nov  6 16:00, Thomas Wolff wrote:

...

I extended your test program to demonstrate the inefficiency of the
standard mbrtowc function. [...]

I later had to correct:

Anyway, corrected results are still by a factor of 3 to 4 in favor ofmy algorithm.

Corinna wrote:

That's sort of an unfair test.  Your utftouni function doesn't care for
mbstate, error, and surrogate pair handling.

This is a question of use cases:

* mbstate is needed e.g. if you feed results of read() which possiblycome in arbitrary chunks directly into mbtowc(); it's not needed if youonly transform complete lines of text at once. The stdlib function is alittle bit too generic (and thus complicated, too) for many applications.* error handling is there, in my function; it's simplified, incorrectsequences are all mapped to 0 for the test case but they could as wellreturn an error indication without performance impact.* surrogate pair handling is only needed if you pass the string from/tothe Windows API. It's not needed for POSIX applications (providedwchar_t would be sufficiently wide). So if wchar_t can be extended inthe newlib API, it might be useful to have two implementations; one forapplications (w/o surrogates), one for cygwin itself.



[...]

My main point was that, depending on the use case, some applicationswould be better off using less generic, optimized functions.The kind of dogmatic suggestion (as seen in the "locale scene") thateverybody should use the stdlib wide character functions is oftenmisleading.

grep and sed would certainly be well advised to change that.

Thomas

[Prev in Thread]

Current Thread

[Next in Thread]

Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file, Eric Blake, 2009/11/06
- Message not available
  - Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file, Thomas Wolff, 2009/11/06
    - Message not available
    - Message not available
    - Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file, Thomas Wolff <=

Prev by Date: Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file
Next by Date: [bug #27919] possible glitch with -i and character class
Previous by thread: Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file
Next by thread: Search a phrase than start with 'a' and ends at 'b, middle anything
Index(es):
- Date
- Thread