bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] built-in variables in extensions


From: Aharon Robbins
Subject: Re: [bug-gawk] built-in variables in extensions
Date: Tue, 18 Dec 2012 18:58:31 +0200
User-agent: Heirloom mailx 12.5 6/20/10

Hi.

I finally understand what Assaf is trying to accomplish.

> On Mon, Dec 17, 2012 at 04:16:15PM -0500, Assaf Gordon wrote:
> > Ideally, to have a way to do the following (assuming the user knows
> > there's a column named "age"):
> > 
> >     gawk -l header '$age>43' INPUT > OUTPUT
> > 
> > That's all - concise and very clear.
>
> Using the current git master tree, SYMTAB seems to do what you want.
> However, based on Arnold's comments, I'm guessing this may not be supported
> in the future:
>
> bash-4.1$ cat /tmp/header.awk
> NR == 1 {
>         for (i = 1; i <= NF; i++)
>                 SYMTAB[$i] = i
>         next
> }
>
> bash-4.1$ ( echo "name age" ; echo "Jim 10" ) | AWKPATH=/tmp ./gawk -i header 
> '{print $age}'
> 10

First, the above is supposed to work. It makes sense.

Let me clarify how things work, and then hopefully you'll understand
why SYMTAB works the way it does, and what you can and can't do with it.

Remember that there are two stages: parse time and run time.

At parse time, gawk parses the program and generates byte code to run.
Part of parsing the program is tracking the identifiers used as functions
and variables (scalars / arrays).  By the time this step is done, SYMTAB
contains all the variables and arrays used in the program, as well as
gawk's built-in variables and arrays.

So, by definition, at this point, gawk knows which string indices into
SYMTAB represent real variables, and anything else does not; instead
such indices (which will be created at run time) are simple array elements,
just like for any other array.

Let's look at the invocation:

        gawk -i header '{print $age}'

So gawk knows that "age" is a variable, and anything else in SYMTAB is just
an array element, before it starts to execute any code.

The first thing executed is this:

        NR == 1 {
                for (i = 1; i <= NF; i++)
                        SYMTAB[$i] = i
                next
        }

Fine and dandy. The actual "age" variable is updated to 2, as expected, and
SYMTAB["name"] gets 1, but nothing in the program uses it, so we don't care.

Using this example, I can now explain what I meant when I said "you can't
do that":  You can't use "name" as a variable that magically comes into
existence at run time.  Variables all come into existence, by definition,
at parse time.

So when, as you stated, "the user knows that "age" is a column header" then
there's no problem when she types the above command line.

> But suppose this is not a stable feature.

It should be stable. As long as you understand what's going on.

> Here is another approach:
>
> bash-4.1$ cat /tmp/header.awk
> @load "create_variable"
>
> NR == 1 {
>         for (i = 1; i <= NF; i++)
>                 create_variable($i, i)
>         next
> }
>
> where "create_variable" is a shared library extension that exports the
> function create_variable that creates a variable whose name is given
> in the first argument and value in the second.

This is also true, but I believe it to be unnecessary.

Thanks,

Arnold



reply via email to

[Prev in Thread] Current Thread [Next in Thread]