[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
newly-exposed errors with invalid-MB input
From: |
Jim Meyering |
Subject: |
newly-exposed errors with invalid-MB input |
Date: |
Fri, 02 Apr 2010 09:57:19 +0200 |
Yesterday, just before I planned to release,
I ran all of "make check" through valgrind.
That showed up some new failures.
The first few here (main/Ecompile/GEAcompile) are not new,
but the others are.
They arise because of this code:
static void
addtok_wc (wint_t wc)
{
unsigned char buf[MB_LEN_MAX];
mbstate_t s;
int i;
memset (&s, 0, sizeof(s));
cur_mb_len = wcrtomb ((char *) buf, wc, &s);
addtok_mb(buf[0], cur_mb_len == 1 ? 3 : 1);
for (i = 1; i < cur_mb_len; i++)
{
addtok_mb(buf[i], i == cur_mb_len - 1 ? 2 : 0);
addtok(CAT);
}
}
It calls addtok_mb with uninitialized buf[0] even
when wcrtomb has returned -1.
[Note, that to reproduce this, you'll probably have to
install the shift-JIS locale. These commands worked for
me on Fedora-based systems:
d=/usr/lib/locale/ja_JP.SHIFT_JIS
mkdir $d && zcat /usr/share/i18n/charmaps/SHIFT_JIS.gz \
| localedef -f - -i /usr/share/i18n/locales/ja_JP $d
]
$ printf '\203\n' > in
$ LC_ALL=ja_JP.SHIFT_JIS valgrind src/grep -E $'\203' in
==16670== Memcheck, a memory error detector
==16670== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==16670== Using Valgrind-3.5.0 and LibVEX; rerun with -h for copyright info
==16670== Command: src/.vg-tmp/grep -E in
==16670==
==16670== Conditional jump or move depends on uninitialised value(s)
==16670== at 0x398F6E2617: iswalnum (wcfuncs.c:41)
==16670== by 0x4202ED: peek_token (regcomp.c:1928)
==16670== by 0x41FD8C: fetch_token (regcomp.c:1766)
==16670== by 0x42089B: parse (regcomp.c:2116)
==16670== by 0x41D716: re_compile_internal (regcomp.c:816)
==16670== by 0x41C702: rpl_re_compile_pattern (regcomp.c:237)
==16670== by 0x402535: GEAcompile (dfasearch.c:156)
==16670== by 0x402135: Ecompile (grep.c:16)
==16670== by 0x40748B: main (main.c:2141)
==16670==
==16670== Use of uninitialised value of size 8
==16670== at 0x398F6E2630: iswalnum (wcfuncs.c:41)
==16670== by 0x4202ED: peek_token (regcomp.c:1928)
==16670== by 0x41FD8C: fetch_token (regcomp.c:1766)
==16670== by 0x42089B: parse (regcomp.c:2116)
==16670== by 0x41D716: re_compile_internal (regcomp.c:816)
==16670== by 0x41C702: rpl_re_compile_pattern (regcomp.c:237)
==16670== by 0x402535: GEAcompile (dfasearch.c:156)
==16670== by 0x402135: Ecompile (grep.c:16)
==16670== by 0x40748B: main (main.c:2141)
==16670==
==16670== Conditional jump or move depends on uninitialised value(s)
==16670== at 0x4202F6: peek_token (regcomp.c:1928)
==16670== by 0x41FD8C: fetch_token (regcomp.c:1766)
==16670== by 0x42089B: parse (regcomp.c:2116)
==16670== by 0x41D716: re_compile_internal (regcomp.c:816)
==16670== by 0x41C702: rpl_re_compile_pattern (regcomp.c:237)
==16670== by 0x402535: GEAcompile (dfasearch.c:156)
==16670== by 0x402135: Ecompile (grep.c:16)
==16670== by 0x40748B: main (main.c:2141)
==16670==
==16670== Conditional jump or move depends on uninitialised value(s)
==16670== at 0x40C598: addtok_mb (dfa.c:1127)
==16670== by 0x40C6B7: addtok_wc (dfa.c:1178)
==16670== by 0x40C743: atom (dfa.c:1228)
==16670== by 0x40C9C3: closure (dfa.c:1308)
==16670== by 0x40CAFE: branch (dfa.c:1341)
==16670== by 0x40CB45: regexp (dfa.c:1352)
==16670== by 0x40CC48: dfaparse (dfa.c:1393)
==16670== by 0x411AB0: dfacomp (dfa.c:3043)
==16670== by 0x4026E1: GEAcompile (dfasearch.c:195)
==16670== by 0x402135: Ecompile (grep.c:16)
==16670== by 0x40748B: main (main.c:2141)
==16670==
==16670== Conditional jump or move depends on uninitialised value(s)
==16670== at 0x4125D9: dfamust (dfa.c:3421)
==16670== by 0x411ABC: dfacomp (dfa.c:3044)
==16670== by 0x4026E1: GEAcompile (dfasearch.c:195)
==16670== by 0x402135: Ecompile (grep.c:16)
==16670== by 0x40748B: main (main.c:2141)
==16670==
==16670== Conditional jump or move depends on uninitialised value(s)
==16670== at 0x4125E4: dfamust (dfa.c:3421)
==16670== by 0x411ABC: dfacomp (dfa.c:3044)
==16670== by 0x4026E1: GEAcompile (dfasearch.c:195)
==16670== by 0x402135: Ecompile (grep.c:16)
==16670== by 0x40748B: main (main.c:2141)
[... there are many more that I have elided...]
My first attempt at a noninvasive fix was to return upon wcrtomb failure,
thus avoiding any reference to buf[0] when cur_mb_len <= 0.
However, that makes it even worse (or maybe "better" as in more apparent ;-)
and results in heap corruption.
So I've done this instead, as a stop-gap measure:
[With it, we're back to the status-quo of "just" the
three re_compile_internal violations. ]
>From 48a4826e2f97d7d6b27b189f46746c0d8417089f Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Fri, 2 Apr 2010 09:51:18 +0200
Subject: [PATCH] grep: avoid used-undefined error with truncated multibyte input
* src/dfa.c (addtok_wc): Don't use buf[0] (it's undefined) when
wcrtomb returns <= 0.
MBS_SUPPORT-removal: * src/dfa.c (dfastate):
---
src/dfa.c | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)
diff --git a/src/dfa.c b/src/dfa.c
index c2ef18c..ba78b08 100644
--- a/src/dfa.c
+++ b/src/dfa.c
@@ -1175,6 +1175,13 @@ addtok_wc (wint_t wc)
int i;
memset (&s, 0, sizeof(s));
cur_mb_len = wcrtomb ((char *) buf, wc, &s);
+
+ /* This is merely stop-gap. When cur_mb_len is 0 or negative,
+ buf[0] is undefined, yet skipping the addtok_mb call altogether
+ can result in heap corruption. */
+ if (cur_mb_len <= 0)
+ buf[0] = 0;
+
addtok_mb(buf[0], cur_mb_len == 1 ? 3 : 1);
for (i = 1; i < cur_mb_len; i++)
{
--
1.7.0.3.513.gc8ed0
I'll push this soon, and then make the release.
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- newly-exposed errors with invalid-MB input,
Jim Meyering <=