[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] regexp RS mangling input
From: |
Aharon Robbins |
Subject: |
Re: [bug-gawk] regexp RS mangling input |
Date: |
Fri, 10 Aug 2012 13:27:23 +0300 |
User-agent: |
Heirloom mailx 12.4 7/29/08 |
Hello. Apologies for the long delay in replying to this.
> Date: Sun, 20 May 2012 01:05:52 -0400
> From: Jay Michael <address@hidden>
> To: address@hidden
> Subject: [bug-gawk] regexp RS mangling input
>
> I'm using a regular expression as RS to soak up everything I don't
> want to see while parsing my input. I want the record terminator to
> include possibly multi-line expanses enclosed in braces.
>
> The first problem I had, gawk seemed to be returning the same
> string for several consecutive internal records. When I tried to track
> down what I was doing wrong, my reduced test case caused gawk to include
> what should have been the first record in the first record's terminator,
> while ending the terminator before the end of the second "comment".
> Then, gawk acted like each character was a record terminator.
>
> I'm running GNU Awk 3.1.3 under Windows XP. I don't know who
> built it, I don't remember where I got it. I tried on a UNIX/Linux
> shell to which I have access. It was running 3.1.1 (or so), it behaved
> the same way as the version on my PC.
You can get a current version of gawk (4.0.1) for Windows from:
http://sourceforge.net/projects/ezwinports/files/
which I highly recommend doing. 3.1.3 is almost ten years old.
> I have attached my program (d.awk) and input (d.i). d.log is not
> really a log file -- I pasted pieces and then appended the output of
> "gawk -f d.awk d.i".
I think the problem is that your regex for RS is too inclusive. You
have
RS = "([ \\n]|(" re_bcom "))*" ;
I believe that the space is giving you problems; it causes each space
to act as a record separator, which is likely not what you want.
Crafting a regular expression can be difficult if you are trying to
match very variable input. You may want to use a more simple RS and
use sub, gsub, or gensub to remove the stuff you don't want from the
record before processing it.
HTH,
Arnold
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: [bug-gawk] regexp RS mangling input,
Aharon Robbins <=