bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] problem with 'match'


From: Aharon Robbins
Subject: Re: [bug-gawk] problem with 'match'
Date: Thu, 19 May 2011 16:30:20 +0300
User-agent: Heirloom mailx 12.4 7/29/08

Greetings. Re this:

> From: Guy L?onis <address@hidden>
> To: "address@hidden" <address@hidden>
> Date: Tue, 17 May 2011 11:00:57 +0000
> Subject: [bug-gawk] problem with 'match'
>
> Dear awk management team,
>
> I have create on top of gawk a tool to extract tags and links from HTML files.
> With gawk V3.1.1 from RHEL V3.0 WS (32b), the tool is perfect.
> With gawk V3.1.8 from Fedora Core 14 (64b), the tools exhibits problems.
> Here is a simple instance to create the problem: 
>
> awk 3.1.1:
> ./awk '{printf "%d\n", match($0, "<[ ]*[aA][ ]+[hH][rR][eE][fF]")}' test.txt
> gives 209, the expected result
>
> awk 3.1.8:
> /bin/awk '{printf "%d\n", match($0, "<[ ]*[aA][ ]+[hH][rR][eE][fF]")}' 
> test.txt
> gives 1, an erroneous value
>
> Please note that line selection with the same regular expression works 
> properly with both awk versions.
>
> I hope it helps. Best regards,
>
> Guy L?onis 
>
> Spacebel                    Tel   : +32-26 58 20 27 
> I. Vandammestraat 7 bus 1   Fax   : +32-26 58 20 90 
> B-1560 Hoeilaart            email : address@hidden 
> BELGIUM

This is a bug, related to UTF locales.  It is fixed in the current
code base.  I believe that this is the relevant diff.

As mentioned, you can check out the current code base using git.

Thanks,

Arnold
------------------------

Index: node.c
===================================================================
RCS file: /d/mongo/cvsrep/gawk-stable/node.c,v
retrieving revision 1.25
retrieving revision 1.26
diff -u -r1.25 -r1.26
--- node.c      14 Jul 2010 20:26:49 -0000      1.25
+++ node.c      5 Sep 2010 17:48:03 -0000       1.26
@@ -811,6 +811,7 @@
                         * for match() where we need to build the indices.
                         */
                        sp++;
+                       src_count--;
                        /*
                         * mbrtowc(3) says the state of mbs becomes undefined
                         * after a bad character, so reset it.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]