[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] gawk character class bug
From: |
Aharon Robbins |
Subject: |
Re: [bug-gawk] gawk character class bug |
Date: |
Mon, 11 Mar 2013 22:44:20 +0200 |
User-agent: |
Heirloom mailx 12.5 6/20/10 |
Hi Ed. Re this:
> Date: Thu, 7 Mar 2013 18:00:01 +0000 (UTC)
> From: Ed Morton <address@hidden>
> To: address@hidden
> Subject: [bug-gawk] gawk character class bug
>
> There's a bug in gawk wrt issuing a warning for character classes being
> used in a bracket expression. I want to remove the first character
> in a string that's either a closing?? square bracket "]" or a space
> "[:space:]".In /usr/xpg4/bin/awk on Solaris I can do this:
>
> $ echo "a]b" | /usr/xpg4/bin/awk '{sub(/[][:space:]]/,"")}1'
> ab
>
> In gawk it works just fine too and does correctly understand that
> [:space:] is a character class BUT I get an unwarranted?warning
> message:
>
> $ echo "a]b" | gawk '{sub(/[][:space:]]/,"")}1' ?? ?? ?? ?? ?? ??
> gawk: cmd. line:1: warning: regexp component `[:space:]' should probably be
> `[[:space:]]'
> ab
>
> See the discussion at
> https://groups.google.com/forum/#!topic/comp.lang.awk/MGX4VKyuv0k ??for
> more info.
>
> Regards,
>
> Ed Morton.
So, it is a (small) bug. The fix is below. This will is relative to the
gawk-4.0-stable branch. I will check in the fix and a new test in the
stable and master branches shortly.
Thanks,
Arnold
---------------------------------------------------
diff --git a/re.c b/re.c
index 711b53e..4c03177 100644
--- a/re.c
+++ b/re.c
@@ -564,8 +564,22 @@ again:
if (*sp == '[')
count++;
- else if (*sp == ']')
- count--;
+ /*
+ * ] as first char after open [ is skipped
+ * \] is skipped
+ * [^]] is skipped
+ */
+ if (*sp == ']' && sp > sp2) {
+ if (sp[-1] != '['
+ && sp[-1] != '\\')
+ ;
+ else if ((sp - sp2) >= 2
+ && sp[-1] == '^' && sp[-2] == '[')
+ ;
+ else
+ count--;
+ }
+
if (*sp == '-' && do_lint && ! range_warned && count == 1
&& sp[-1] != '[' && sp[1] != ']'
&& ! isdigit((unsigned char) sp[-1]) && ! isdigit((unsigned
char) sp[1])