Re: [bug-gawk] Regex treatment of NUL characters within fields

bug-gawk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Regex treatment of NUL characters within fields

From:	arnold
Subject:	Re: [bug-gawk] Regex treatment of NUL characters within fields
Date:	Mon, 30 Mar 2015 08:03:46 -0600
User-agent:	Heirloom mailx 12.4 7/29/08

Hi.

This is a fascinating bug report.  It looks like a bug in the regex
matching, since the $ should not match.  Debugging this will be a
challenge, but I'll take a look.

In the meantime, something like tr -d '\0' can be used to simply
remove NUL bytes from an imput file.

I will work on this.

Thanks,

Arnold

Matt Wenham <address@hidden> wrote:

> I have found a use case which has made me unsure as to how gawk 4.1.1
> treats NUL characters within fields and how they are parsed by the
> regex engine.
>
> I have a series of files which I am trying to process and validate
> using gawk. A small number of the files are corrupt and contain runs
> of NUL characters which I would like to reject as invalid.
>
> I tried the following code:
>
> BEGIN {
>     FS="[#/]"   #Split at hash or slash
>     OFS = ":"
> }
>
> $10 ~ "^7$" {
>     print NR, $10
> }
>
> This successfully matches the digit '7' followed by a run of NULs in
> the tenth field. However, using
>
> $10 ~ "^7\0+$"
>
> fails to match the same tenth field despite the explicitly specified
> NUL character. From everything I've read, this is unexpected
> behaviour.
>
> I am using GnuWin32 in this case. I asked about the issue on
> Stackoverflow, and another user has found that this behaviour does not
> occur with gawk 3.1.5 on CentOS 5, but does occur with gawk 4.1.1 on
> debian unstable.
>
> Is this expected behaviour? If so how? Is it possible to successfully
> parse NUL characters in 4.1.1?
>
> Many thanks,
>
> Dr. Matt Wenham.

[Prev in Thread]

Current Thread

[Next in Thread]

[bug-gawk] Regex treatment of NUL characters within fields, Matt Wenham, 2015/03/30
- Re: [bug-gawk] Regex treatment of NUL characters within fields, Andrew J. Schorr, 2015/03/30
  - Re: [bug-gawk] Regex treatment of NUL characters within fields, arnold, 2015/03/30
- Re: [bug-gawk] Regex treatment of NUL characters within fields, Manuel Collado, 2015/03/30
  - Re: [bug-gawk] Regex treatment of NUL characters within fields, Matt Wenham, 2015/03/30
    - Re: [bug-gawk] Regex treatment of NUL characters within fields, Matt Wenham, 2015/03/30
    - Re: [bug-gawk] Regex treatment of NUL characters within fields, Aharon Robbins, 2015/03/30
- Re: [bug-gawk] Regex treatment of NUL characters within fields, arnold <=

Prev by Date: Re: [bug-gawk] Regex treatment of NUL characters within fields
Next by Date: Re: [bug-gawk] Regex treatment of NUL characters within fields
Previous by thread: Re: [bug-gawk] Regex treatment of NUL characters within fields
Next by thread: [bug-gawk] gawk simple compare not working?
Index(es):
- Date
- Thread