[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug #36567] grep -i (case-insensitive) is broken with UTF8
From: |
Jim Meyering |
Subject: |
Re: [bug #36567] grep -i (case-insensitive) is broken with UTF8 |
Date: |
Fri, 01 Jun 2012 23:13:11 +0200 |
Paul Eggert wrote:
> On 06/01/2012 08:18 AM, Jim Meyering wrote:
>> I've implemented the above, and have begun testing.
>
> Thanks. One thing I noticed when looking at the code is
> that the mbtolower API causes the compiler to refuse to
> put the 'size' local into a register. The following patch
> fixes this inefficiency. This undoubtedly interacts with
> what you're doing so I'm just putting it into this email
> now as a heads-up.
Odd... that seems like a transformation that gcc should be able to perform.
If gcc someday learns to recognize and optimize this sort of code,
is it really worth making this change?
In any case, thanks for holding off.
I expect to push the turkish-I fix tomorrow.
> Subject: [PATCH] grep: tune by allowing compiler to put size in register
>
> * src/dfasearch.c (EGexecute):
> * src/kwsearch.c (Fcompile, Fexecute):
> Pass address of otherwise-unused variable to mbtolower, so that
> the compiler can keep the often-used size variable in a register.
> ---
> src/dfasearch.c | 4 +++-
> src/kwsearch.c | 8 +++++---
> 2 files changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/src/dfasearch.c b/src/dfasearch.c
> index bd09aa6..b7c0da7 100644
> --- a/src/dfasearch.c
> +++ b/src/dfasearch.c
> @@ -223,10 +223,12 @@ EGexecute (char const *buf, size_t size, size_t
> *match_size,
> {
> /* mbtolower adds a NUL byte at the end. That will provide
> space for the sentinel byte dfaexec may add. */
> - char *case_buf = mbtolower (buf, &size);
> + size_t case_size = size;
> + char *case_buf = mbtolower (buf, &case_size);
> if (start_ptr)
> start_ptr = case_buf + (start_ptr - buf);
> buf = case_buf;
> + size = case_size;
> }
> }
>
> diff --git a/src/kwsearch.c b/src/kwsearch.c
> index f1a802e..829fe12 100644
> --- a/src/kwsearch.c
> +++ b/src/kwsearch.c
> @@ -33,10 +33,10 @@ void
> Fcompile (char const *pattern, size_t size)
> {
> char const *err;
> - size_t psize = size;
> char const *pat = (match_icase && MB_CUR_MAX > 1
> - ? mbtolower (pattern, &psize)
> + ? mbtolower (pattern, &size)
> : pattern);
> + size_t psize = size;
>
> kwsinit (&kwset);
>
> @@ -87,10 +87,12 @@ Fexecute (char const *buf, size_t size, size_t
> *match_size,
> {
> if (match_icase)
> {
> - char *case_buf = mbtolower (buf, &size);
> + size_t case_size = size;
> + char *case_buf = mbtolower (buf, &case_size);
> if (start_ptr)
> start_ptr = case_buf + (start_ptr - buf);
> buf = case_buf;
> + size = case_size;
> }
> }
- Re: [bug #36567] grep -i (case-insensitive) is broken with UTF8, (continued)
- Re: [bug #36567] grep -i (case-insensitive) is broken with UTF8, Jim Meyering, 2012/06/12
- Re: [bug #36567] grep -i (case-insensitive) is broken with UTF8, Paul Eggert, 2012/06/12
- Re: [bug #36567] grep -i (case-insensitive) is broken with UTF8, Paolo Bonzini, 2012/06/15
- [PATCH] grep -i: work also when converting to lower-case inflates byte count, Jim Meyering, 2012/06/16
- Re: [PATCH] grep -i: work also when converting to lower-case inflates byte count, Paul Eggert, 2012/06/16
- Re: [PATCH] grep -i: work also when converting to lower-case inflates byte count, Jim Meyering, 2012/06/16
- Re: [PATCH] grep -i: work also when converting to lower-case inflates byte count, Paolo Bonzini, 2012/06/23
Re: [bug #36567] grep -i (case-insensitive) is broken with UTF8, Jim Meyering, 2012/06/01
Re: [bug #36567] grep -i (case-insensitive) is broken with UTF8, Jim Meyering, 2012/06/01
Re: [bug #36567] grep -i (case-insensitive) is broken with UTF8, Jim Meyering, 2012/06/01
Re: [bug #36567] grep -i (case-insensitive) is broken with UTF8, Jim Meyering, 2012/06/02
Re: [bug #36567] grep -i (case-insensitive) is broken with UTF8, Johannes Meixner, 2012/06/12
Re: [bug #36567] grep -i (case-insensitive) is broken with UTF8, Johannes Meixner, 2012/06/14