Re: article about gawk best practices in data science and feature propos

bug-gawk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: article about gawk best practices in data science and feature propos

From:	Andrew J. Schorr
Subject:	Re: article about gawk best practices in data science and feature proposal
Date:	Thu, 11 Feb 2021 09:17:45 -0500
User-agent:	Mutt/1.5.21 (2010-09-15)

Hi,

On Thu, Feb 11, 2021 at 10:53:19AM +0100, Ivan Molineris wrote:
> Moreover, one of the biggest drawbacks of gawk in our field is the fact
> that, indicating the columns of the input by numbers often produces hard to
> read scripts.
> For this reason in the wrapper I commonly use it is possible to refer to
> columns not only by number, but also by name.
> 
> For example, if a file is composed like this:
> 
> chromosome     start        end
>       chr1       241      53521
>       chr1       363      43623
>       chr2      5243     234562
> 
> gawk '{l=$2-$1}'
> can be also written as
> gawk '{l=$end-$start}'
> 
> I know that this syntax is not back-compatible, maybe can be improved.
> 
> Do you know if someone has reasoned about a feature like this one in the
> past?

Regarding this point: I often have files like this with a
header title row. I typically do something like this:

gawk '
NR == 1 {
  for (i = 1; i <= NF; i++)
    m[$i] = i
  # optional: check that all required columns are present
  next
}

{
  # to take your example
  l = $m["end"]-$m["start"]
}'

To me, this is more elegant than hardcoding

gawk -vstart=2 -vend=3 'NR > 1 {l = $end-$start}'

Regards,
Andy

[Prev in Thread]

Current Thread

[Next in Thread]

article about gawk best practices in data science and feature proposal, Ivan Molineris, 2021/02/11
- Re: article about gawk best practices in data science and feature proposal, arnold, 2021/02/11
  - Re: article about gawk best practices in data science and feature proposal, david kerns, 2021/02/11
  - Re: article about gawk best practices in data science and feature proposal, Manuel Collado, 2021/02/11
    - Re: article about gawk best practices in data science and feature proposal, Andrew J. Schorr, 2021/02/11
    - Re: article about gawk best practices in data science and feature proposal, Manuel Collado, 2021/02/11
  - Re: article about gawk best practices in data science and feature proposal, Jean-Philippe Guérard, 2021/02/11
- Re: article about gawk best practices in data science and feature proposal, Andrew J. Schorr <=

Prev by Date: Re: complie with mpfr support
Next by Date: Re: article about gawk best practices in data science and feature proposal
Previous by thread: Re: article about gawk best practices in data science and feature proposal
Next by thread: Building on Msys2
Index(es):
- Date
- Thread