[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] problem with 'match'
From: |
Aharon Robbins |
Subject: |
Re: [bug-gawk] problem with 'match' |
Date: |
Thu, 19 May 2011 16:30:20 +0300 |
User-agent: |
Heirloom mailx 12.4 7/29/08 |
Greetings. Re this:
> From: Guy L?onis <address@hidden>
> To: "address@hidden" <address@hidden>
> Date: Tue, 17 May 2011 11:00:57 +0000
> Subject: [bug-gawk] problem with 'match'
>
> Dear awk management team,
>
> I have create on top of gawk a tool to extract tags and links from HTML files.
> With gawk V3.1.1 from RHEL V3.0 WS (32b), the tool is perfect.
> With gawk V3.1.8 from Fedora Core 14 (64b), the tools exhibits problems.
> Here is a simple instance to create the problem:
>
> awk 3.1.1:
> ./awk '{printf "%d\n", match($0, "<[ ]*[aA][ ]+[hH][rR][eE][fF]")}' test.txt
> gives 209, the expected result
>
> awk 3.1.8:
> /bin/awk '{printf "%d\n", match($0, "<[ ]*[aA][ ]+[hH][rR][eE][fF]")}'
> test.txt
> gives 1, an erroneous value
>
> Please note that line selection with the same regular expression works
> properly with both awk versions.
>
> I hope it helps. Best regards,
>
> Guy L?onis
>
> Spacebel Tel : +32-26 58 20 27
> I. Vandammestraat 7 bus 1 Fax : +32-26 58 20 90
> B-1560 Hoeilaart email : address@hidden
> BELGIUM
This is a bug, related to UTF locales. It is fixed in the current
code base. I believe that this is the relevant diff.
As mentioned, you can check out the current code base using git.
Thanks,
Arnold
------------------------
Index: node.c
===================================================================
RCS file: /d/mongo/cvsrep/gawk-stable/node.c,v
retrieving revision 1.25
retrieving revision 1.26
diff -u -r1.25 -r1.26
--- node.c 14 Jul 2010 20:26:49 -0000 1.25
+++ node.c 5 Sep 2010 17:48:03 -0000 1.26
@@ -811,6 +811,7 @@
* for match() where we need to build the indices.
*/
sp++;
+ src_count--;
/*
* mbrtowc(3) says the state of mbs becomes undefined
* after a bad character, so reset it.