[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug #36567] grep -i (case-insensitive) is broken with UTF8
From: |
Paul Eggert |
Subject: |
Re: [bug #36567] grep -i (case-insensitive) is broken with UTF8 |
Date: |
Fri, 01 Jun 2012 09:03:44 -0700 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120430 Thunderbird/12.0.1 |
On 06/01/2012 08:18 AM, Jim Meyering wrote:
> I've implemented the above, and have begun testing.
Thanks. One thing I noticed when looking at the code is
that the mbtolower API causes the compiler to refuse to
put the 'size' local into a register. The following patch
fixes this inefficiency. This undoubtedly interacts with
what you're doing so I'm just putting it into this email
now as a heads-up.
>From 3cd025a7b2700799e74457bd9c6d6d6e9e345a19 Mon Sep 17 00:00:00 2001
From: Paul Eggert <address@hidden>
Date: Fri, 1 Jun 2012 08:59:50 -0700
Subject: [PATCH] grep: tune by allowing compiler to put size in register
* src/dfasearch.c (EGexecute):
* src/kwsearch.c (Fcompile, Fexecute):
Pass address of otherwise-unused variable to mbtolower, so that
the compiler can keep the often-used size variable in a register.
---
src/dfasearch.c | 4 +++-
src/kwsearch.c | 8 +++++---
2 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/src/dfasearch.c b/src/dfasearch.c
index bd09aa6..b7c0da7 100644
--- a/src/dfasearch.c
+++ b/src/dfasearch.c
@@ -223,10 +223,12 @@ EGexecute (char const *buf, size_t size, size_t
*match_size,
{
/* mbtolower adds a NUL byte at the end. That will provide
space for the sentinel byte dfaexec may add. */
- char *case_buf = mbtolower (buf, &size);
+ size_t case_size = size;
+ char *case_buf = mbtolower (buf, &case_size);
if (start_ptr)
start_ptr = case_buf + (start_ptr - buf);
buf = case_buf;
+ size = case_size;
}
}
diff --git a/src/kwsearch.c b/src/kwsearch.c
index f1a802e..829fe12 100644
--- a/src/kwsearch.c
+++ b/src/kwsearch.c
@@ -33,10 +33,10 @@ void
Fcompile (char const *pattern, size_t size)
{
char const *err;
- size_t psize = size;
char const *pat = (match_icase && MB_CUR_MAX > 1
- ? mbtolower (pattern, &psize)
+ ? mbtolower (pattern, &size)
: pattern);
+ size_t psize = size;
kwsinit (&kwset);
@@ -87,10 +87,12 @@ Fexecute (char const *buf, size_t size, size_t *match_size,
{
if (match_icase)
{
- char *case_buf = mbtolower (buf, &size);
+ size_t case_size = size;
+ char *case_buf = mbtolower (buf, &case_size);
if (start_ptr)
start_ptr = case_buf + (start_ptr - buf);
buf = case_buf;
+ size = case_size;
}
}
--
1.7.6.5
- [bug #36567] grep -i (case-insensitive) is broken with UTF8, (continued)
- Re: [bug #36567] grep -i (case-insensitive) is broken with UTF8, Jim Meyering, 2012/06/12
- Re: [bug #36567] grep -i (case-insensitive) is broken with UTF8, Paul Eggert, 2012/06/12
- Re: [bug #36567] grep -i (case-insensitive) is broken with UTF8, Paolo Bonzini, 2012/06/15
- [PATCH] grep -i: work also when converting to lower-case inflates byte count, Jim Meyering, 2012/06/16
- Re: [PATCH] grep -i: work also when converting to lower-case inflates byte count, Paul Eggert, 2012/06/16
- Re: [PATCH] grep -i: work also when converting to lower-case inflates byte count, Jim Meyering, 2012/06/16
- Re: [PATCH] grep -i: work also when converting to lower-case inflates byte count, Paolo Bonzini, 2012/06/23
Re: [bug #36567] grep -i (case-insensitive) is broken with UTF8, Jim Meyering, 2012/06/01
Re: [bug #36567] grep -i (case-insensitive) is broken with UTF8, Jim Meyering, 2012/06/01
Re: [bug #36567] grep -i (case-insensitive) is broken with UTF8, Jim Meyering, 2012/06/01
Re: [bug #36567] grep -i (case-insensitive) is broken with UTF8, Jim Meyering, 2012/06/02
Re: [bug #36567] grep -i (case-insensitive) is broken with UTF8, Johannes Meixner, 2012/06/12
Re: [bug #36567] grep -i (case-insensitive) is broken with UTF8, Johannes Meixner, 2012/06/14