[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 1/2] dfa: convert to wide character line-by-line
From: |
Jim Meyering |
Subject: |
Re: [PATCH 1/2] dfa: convert to wide character line-by-line |
Date: |
Wed, 05 May 2010 10:12:27 +0200 |
Jim Meyering wrote:
> Jim Meyering wrote:
>> Paolo Bonzini wrote:
>>> This provides a nice speedup for -m in general, but especially
>>> it avoids quadratic complexity in case we have to go to glibc.
>>>
>>> * NEWS: Document change.
>>> * src/dfa.c (prepare_wc_buf): Extract out of dfaexec. Convert
>>> only up to the next newline.
>>> (dfaexec): Exit multibyte processing loop if past buf_end.
>>> Call prepare_wc_buf again after processing a newline.
>>
>> Nice indeed, but it induces an abort from glibc with some inputs:
>> This is on F13:
>>
>> $ export LC_ALL=fr_FR.UTF-8
>> $ printf '\xc3\n' > in; src/grep '[é]' in
>> *** buffer overflow detected ***: src/grep terminated
>> ======= Backtrace: =======
>> /lib64/libc.so.6(__fortify_fail+0x37)[0x33920fedb7]
>> /lib64/libc.so.6[0x33920fcd30]
>> /lib64/libc.so.6(__strncpy_chk+0x17b)[0x33920fbfeb]
>> ...
>> zsh: abort (core dumped) src/grep '[é]' in
>>
>>> +/* Initialize mblen_buf and inputwcs with data from the next line. */
>>> +
>>> +static void
>>> +prepare_wc_buf (const char *begin, const char *end)
>>> +{
>>> + unsigned char eol = eolbyte;
>>> + size_t remain_bytes, i;
>>
>> Here's the quick summary:
>>
>> "remain_bytes" must not be of type "size_t".
>> Please leave it as "int", like it was in the code you moved.
>
> Perhaps better:
> Leave remain_bytes declared as size_t, since that's what mbrtowc
> returns, after all.
>
> But then you'll have to adjust the test to explicitly
> check for (size_t) -1 and (size_t) -2.
>
> if (remain_bytes < 1
> + || remain_bytes == (size_t) -1
> + || remain_bytes == (size_t) -2
> || (remain_bytes == 1 && inputwcs[i] == (wchar_t)begin[i]))
I see you pushed the buggy change-set,
so I've just pushed this fix:
>From e0ac660a2c1c1484baeb27716d7aeb37431b4ca3 Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Wed, 5 May 2010 10:09:47 +0200
Subject: [PATCH] dfa: avoid segfault when processing an invalid multi-byte
sequence
* src/dfa.c (dfaexec): Handle the cases in which mbrtowc returns
(size_t)-1 or (size_t)-2, rather than setting mblen_buf[i] to an
outrageously large value.
---
src/dfa.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/src/dfa.c b/src/dfa.c
index 131449c..afcb55b 100644
--- a/src/dfa.c
+++ b/src/dfa.c
@@ -3119,6 +3119,8 @@ prepare_wc_buf (const char *begin, const char *end)
remain_bytes
= mbrtowc(inputwcs + i, begin + i, end - begin - i + 1, &mbs);
if (remain_bytes < 1
+ || remain_bytes == (size_t) -1
+ || remain_bytes == (size_t) -2
|| (remain_bytes == 1 && inputwcs[i] == (wchar_t)begin[i]))
{
remain_bytes = 0;
--
1.7.1.335.g6845a
Re: [PATCH 1/2] dfa: convert to wide character line-by-line, Jim Meyering, 2010/05/05