bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] FIELDWIDTHS can miscount the number of fields


From: arnold
Subject: Re: [bug-gawk] FIELDWIDTHS can miscount the number of fields
Date: Sun, 21 May 2017 21:01:24 -0600
User-agent: Heirloom mailx 12.4 7/29/08

I think I'll go with this. Thanks for the feedback.

Arnold

"Andrew J. Schorr" <address@hidden> wrote:

> On Sun, May 21, 2017 at 10:12:44PM +0300, Arnold Robbins wrote:
> > Q1. Given FIELDWIDTHS = "2 3 4" and input data "aabb". How many fields
> >    should there be?
> >    A. Two, since that's all the data that's there
> >    B. Three, with $3 == "", since it's supposed to be all fixed width data
> > 
> > A1. Gawk currently says three. Arnold leans towards two, since it reflects
> >     the actual data and allows code expecting three fields to weed out
> >     bad records.
>
> I agree.
>
> > Q2. Given FIELDWIDTHS = "2 3 4" and input data "aab", should $2 have a
> >     value?
> >     A. No - we're expecting three characters and they weren't all there
> >     B. Yes - something was there, make it available
> > 
> > A2. Gawk currently says "yes".  Arnold isn't sure what's right here.
> >     Input is welcome.
>
> I agree with current behavior (B).
>
> > Q3. Given FIELDWIDTHS = "2 3 4" and input data "aabbbccccddd" what should
> >     be done with the dddd?
> >     A. Nothing - it's extra, ignore it. NF should be set to 3. Code that
> >        wants to know if there's something extra can use length() and
> >        substr() to get it out of the record.
> >     B. Stick it into $4 anyway.
> > 
> > A3. Arnold and gawk agree on (A).
>
> Since we plan to add support for trailing "*" as in Q4 below, I would
> choose the approach that is easiest to implement. I think that's probably A,
> since that's what we do now. Those who are interested in trailing data
> can use "*".
>
> > Q4. Given the idea that using "*" at the end of FIELDWIDTHS to mean
> >     anything else, then with FIELDWIDTHS = "2 3 4 *", and input
> >     data "aabbbccccdddd" the dddd would go into $4. The final data
> >     would be optional.  Is there any reason not to add this to gawk?
> >     It seems to be actually useful and not just theoretically useful.
> > 
> > A4. Arnold thinks it's right to add it.
>
> Agreed. I presume that NF will be 3 if the record length is 9 and 4 for
> 10 or longer.
>
> Regards,
> Andy



reply via email to

[Prev in Thread] Current Thread [Next in Thread]