Re: [PATCH 1/2] dfa: convert to wide character line-by-line

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/2] dfa: convert to wide character line-by-line

From:	Jim Meyering
Subject:	Re: [PATCH 1/2] dfa: convert to wide character line-by-line
Date:	Wed, 05 May 2010 10:12:27 +0200

Jim Meyering wrote:

> Jim Meyering wrote:
>> Paolo Bonzini wrote:
>>> This provides a nice speedup for -m in general, but especially
>>> it avoids quadratic complexity in case we have to go to glibc.
>>>
>>> * NEWS: Document change.
>>> * src/dfa.c (prepare_wc_buf): Extract out of dfaexec.  Convert
>>> only up to the next newline.
>>> (dfaexec): Exit multibyte processing loop if past buf_end.
>>> Call prepare_wc_buf again after processing a newline.
>>
>> Nice indeed, but it induces an abort from glibc with some inputs:
>> This is on F13:
>>
>>     $ export LC_ALL=fr_FR.UTF-8
>>     $ printf '\xc3\n' > in; src/grep '[é]' in
>>     *** buffer overflow detected ***: src/grep terminated
>>     ======= Backtrace: =======
>>     /lib64/libc.so.6(__fortify_fail+0x37)[0x33920fedb7]
>>     /lib64/libc.so.6[0x33920fcd30]
>>     /lib64/libc.so.6(__strncpy_chk+0x17b)[0x33920fbfeb]
>>     ...
>>     zsh: abort (core dumped)  src/grep '[é]' in
>>
>>> +/* Initialize mblen_buf and inputwcs with data from the next line.  */
>>> +
>>> +static void
>>> +prepare_wc_buf (const char *begin, const char *end)
>>> +{
>>> +  unsigned char eol = eolbyte;
>>> +  size_t remain_bytes, i;
>>
>> Here's the quick summary:
>>
>>   "remain_bytes" must not be of type "size_t".
>>   Please leave it as "int", like it was in the code you moved.
>
> Perhaps better:
> Leave remain_bytes declared as size_t, since that's what mbrtowc
> returns, after all.
>
> But then you'll have to adjust the test to explicitly
> check for (size_t) -1 and (size_t) -2.
>
>            if (remain_bytes < 1
> +              || remain_bytes == (size_t) -1
> +              || remain_bytes == (size_t) -2
>                || (remain_bytes == 1 && inputwcs[i] == (wchar_t)begin[i]))

I see you pushed the buggy change-set,
so I've just pushed this fix:

>From e0ac660a2c1c1484baeb27716d7aeb37431b4ca3 Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Wed, 5 May 2010 10:09:47 +0200
Subject: [PATCH] dfa: avoid segfault when processing an invalid multi-byte 
sequence

* src/dfa.c (dfaexec): Handle the cases in which mbrtowc returns
(size_t)-1 or (size_t)-2, rather than setting mblen_buf[i] to an
outrageously large value.
---
 src/dfa.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/src/dfa.c b/src/dfa.c
index 131449c..afcb55b 100644
--- a/src/dfa.c
+++ b/src/dfa.c
@@ -3119,6 +3119,8 @@ prepare_wc_buf (const char *begin, const char *end)
           remain_bytes
             = mbrtowc(inputwcs + i, begin + i, end - begin - i + 1, &mbs);
           if (remain_bytes < 1
+              || remain_bytes == (size_t) -1
+              || remain_bytes == (size_t) -2
               || (remain_bytes == 1 && inputwcs[i] == (wchar_t)begin[i]))
             {
               remain_bytes = 0;
--
1.7.1.335.g6845a

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH 1/2] dfa: convert to wide character line-by-line, Paolo Bonzini, 2010/05/04
- [PATCH 2/2] tests: add test for newly-fixed performance problem, Paolo Bonzini, 2010/05/04
  - Re: [PATCH 2/2] tests: add test for newly-fixed performance problem, Jim Meyering, 2010/05/04
    - Re: [PATCH 2/2] tests: add test for newly-fixed performance problem, Paolo Bonzini, 2010/05/05
    - Re: [PATCH 2/2] tests: add test for newly-fixed performance problem, Jim Meyering, 2010/05/05
    - Re: [PATCH 2/2] tests: add test for newly-fixed performance problem, Aharon Robbins, 2010/05/05
- Re: [PATCH 1/2] dfa: convert to wide character line-by-line, Jim Meyering, 2010/05/05
  - Re: [PATCH 1/2] dfa: convert to wide character line-by-line, Jim Meyering, 2010/05/05
    - Re: [PATCH 1/2] dfa: convert to wide character line-by-line, Jim Meyering <=
    - Re: [PATCH 1/2] dfa: convert to wide character line-by-line, Paolo Bonzini, 2010/05/05

Prev by Date: Re: [PATCH 2/2] tests: add test for newly-fixed performance problem
Next by Date: Re: [PATCH 1/2] dfa: convert to wide character line-by-line
Previous by thread: Re: [PATCH 1/2] dfa: convert to wide character line-by-line
Next by thread: Re: [PATCH 1/2] dfa: convert to wide character line-by-line
Index(es):
- Date
- Thread