[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Bug in gawk 3.1.4
From: |
Aharon Robbins |
Subject: |
Re: Bug in gawk 3.1.4 |
Date: |
Thu, 25 Nov 2004 13:57:29 +0200 |
Greetings. Re this:
> From: Bruce Lilly <address@hidden>
> To: address@hidden
> Subject: Bug in gawk 3.1.4
> Date: Wed, 24 Nov 2004 21:11:06 -0500
> X-Spam-Level:
>
>
> Hello,
>
> I found a bug when running Jon Bentley's dformat awk script (CSTR 142).
> All tests described below on SuSE Linux 9.1 Professional, gawk built from
> source using gcc 3.4.3, bison 1.875d (both also built from source). Gawk
> passes make check "ALL TESTS PASSED".
>
> I have attached a simplified awk script and test data file.
>
> The simplified script when run on the simple data file shows a bug in
> gawk pattern matching. The same script and data with "the one
> true awk" from Brian Kernighan's web site, also built from source,
> same compiler, etc. works fine:
>
> marty:/src/gawk/gawk-3.1.4 # ./gawk -f awktest data
> line begins with non-whitespace: left
> line begins with whitespace: space
> line begins with whitespace: tab
> line begins with whitespace: left
> line begins with whitespace: space
> line begins with whitespace: tab
> marty:/src/gawk/gawk-3.1.4 # /usr/bin/awk -f awktest data
> line begins with non-whitespace: left
> line begins with whitespace: space
> line begins with whitespace: tab
> line begins with non-whitespace: left
> line begins with whitespace: space
> line begins with whitespace: tab
>
> I haven't determined precisely where the bug is, but it's clear that
> there is a bug. Note that gawk fails to match the second line
> which begins with a non-whitespace character to the input
> pattern /^[^ \t]/.
>
> Best regards,
> Bruce Lilly
If you use `export LC_ALL=C' the problem will be hidden. Otherwise,
you can apply this patch.
Thanks,
Arnold
--- ../gawk-3.1.4/dfa.c 2004-07-26 17:11:41.000000000 +0300
+++ dfa.c 2004-10-21 17:12:19.000000000 +0200
@@ -2871,6 +2871,14 @@
if (MB_CUR_MAX > 1)
{
int remain_bytes, i;
+#if 0
+ /*
+ * This caching can get things wrong:
+
+ printf "ab\n\tb\n" | LC_ALL=de_DE.UTF-8 ./gawk '/^[ \t]/ { print }'
+
+ * should print \tb but doesn't
+ */
buf_begin -= buf_offset;
if (buf_begin <= (unsigned char const *)begin && (unsigned char const *)
end <= buf_end) {
buf_offset = (unsigned char const *)begin - buf_begin;
@@ -2878,6 +2886,7 @@
buf_end = end;
goto go_fast;
}
+#endif
buf_offset = 0;
buf_begin = begin;
@@ -2916,7 +2925,9 @@
mblen_buf[i] = 0;
inputwcs[i] = 0; /* sentinel */
}
+#if 0
go_fast:
+#endif
#endif /* MBS_SUPPORT */
for (;;)
@@ -2930,7 +2941,7 @@
s1 = s;
if (d->states[s].mbps.nelem != 0)
{
- /* Can match with a multibyte character( and multi character
+ /* Can match with a multibyte character (and multi character
collating element). */
unsigned char const *nextp;
@@ -3668,9 +3679,9 @@
done:
if (strlen(result))
{
- dm = (struct dfamust *) malloc(sizeof (struct dfamust));
+ MALLOC(dm, struct dfamust, 1);
dm->exact = exact;
- dm->must = malloc(strlen(result) + 1);
+ MALLOC(dm->must, char, strlen(result) + 1);
strcpy(dm->must, result);
dm->next = dfa->musts;
dfa->musts = dm;