bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales

From:	Paul Eggert
Subject:	bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales
Date:	Sun, 28 Sep 2014 08:09:33 -0700
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.2

Zoltán Herczeg wrote:

For me the question is whether binary search needs to supported on PCRE level.

It's purely a performance question. GNU grep already uses libpcre to searchbinary data, and it works now. It's just slow, that all. I'm willing to livewith this, and tell users "Sorry, but libpcre is not designed to search binarydata quickly; if you want speed then don't use grep's -P option." If you'rewilling to live with this too, we're done.

removing a lot of optimizations.

You shouldn't need to remove any optimizations for the PCRE_NO_UTF8_CHECK case.Keep them all. It should be just as fast before. The idea is to have onematcher for the PCRE_NO_UTF8_CHECK case (one that works much as now) and anothermatcher for the non-PCRE_NO_UTF8_CHECK case (one that checks validity as itgoes). The former matcher will be just as fast as now, and the latter matcherwill be faster than what libpcre has now. I readily concede that this willrequire some nontrivial coding, but I don't concede that it will removeoptimizations or make libpcre slower. It should make libpcre faster; that's thepoint.

You have a 100 byte long buffer, and you start matching from byte 50.

Grep already does that sort of thing. And it's smart enough to start matchingonly at character boundaries. It's not libpcre's job to worry about this; thecaller can worry about it.

For me this is way too much checks, and affects compiler optimizations too much.

The code you posted could be made faster than that; among other things thereshould not be an unbounded backward scan. And even the code you posted wouldoften be faster than what's in libpcre now. That early UTF-8 validity prepassis a killer.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales, (continued)
- bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales, Zoltán Herczeg, 2014/09/22
  - bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales, Paul Eggert, 2014/09/25
    - bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales, Zoltán Herczeg, 2014/09/26
    - bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales, Paul Eggert, 2014/09/26
    - bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales, Zoltán Herczeg, 2014/09/26
    - bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales, Paul Eggert, 2014/09/26
    - bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales, Zoltán Herczeg, 2014/09/27
    - bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales, Paul Eggert, 2014/09/27
    - bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales, Zoltán Herczeg, 2014/09/28
    - bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales, Paul Eggert <=
    - bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales, Zoltán Herczeg, 2014/09/30
    - bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales, Paul Eggert, 2014/09/30

Prev by Date: bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales
Next by Date: bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales
Previous by thread: bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales
Next by thread: bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales
Index(es):
- Date
- Thread