help-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Shift and rotate in gawk


From: hackerb9
Subject: Shift and rotate in gawk
Date: Sun, 28 Apr 2024 20:11:46 -0700

Hi folks,

Can someone help me either prove that my gawk code is correct or explain
why it is not? I’ve been emailing back and forth with a highly respected
expert who says it is wrong and has not believed my gawk output, my timing
tests, or even my analysis of the gawk source code.

*The problem*: Shift the fields to the left by one and rotate $1 to $NF.

*My solution*: $(NF+1) = $1; for (i=1; i<NF; i++) $i=$(i+1); NF--


   Click to see a full script that is easy to test

   #!/bin/bash
   # rotate.awk v1
     -*- awk -*-
   # Input: 1_2_3_4_..._999999_1000000
   # e.g., awk -vn=1E6 'BEGIN { OFS="_"; while (++i<=n) { $i=i }; print; exit }'

   if [[ $# == 0 ]]; then cat /dev/stdin; else echo "$@"; fi |
         ${AWK:-gawk} -F_ '
         BEGIN { print ARGV[0] }
   {
       FS="_";
       OFS="XXX";
       $3 = "_3a_3b_3c_";
       print "Modifying $3 to match FS (_) to test replacement with OFS (XXX)"

       print "NF is " NF
       for (i=1; i<=3; i++)
           print i": "$i

       $(NF+1) = $1; # Rotate, comment out this line to discard $1
       for (i=1; i<NF; i++) $i=$(i+1)
       NF--;

       print ""

       print "NF is " NF
       for (i=1; i<=3; i++)
           print i": "$i

       for (i=NF-2; i<=NF; i++)
           print i": "$i
   }
   '

   Example output from /bin/time ./rotate.awk < numbers.1E6, where the file
   numbers.1E6 was created using, awk -vn=1E6 'BEGIN { OFS="_"; while
   (++i<=n) { $i=i }; print; exit }' > numbers.1E6.

   gawk
   Modifying $3 to match FS (_) to test replacement with OFS (XXX)
   NF is 1000000
   1: 1
   2: 2
   3: _3a_3b_3c_

   NF is 1000000
   1: 2
   2: _3a_3b_3c_
   3: 4
   999998: 999999
   999999: 1000000
   1000000: 1

   0.20user 0.06system 0:00.26elapsed 101%CPU (0avgtext+0avgdata
168380maxresident)k
   0inputs+0outputs (0major+39748minor)pagefaults 0swaps



*The expert’s response*:

   1.

   It will replace all strings that match FS with the value of OFS.
   2.

   It will reconstruct $0 NF times for every line so it’ll be slow.
   3.

   a. Modifying a field causes awk to reconstruct $0 replacing every FS
   with OFS.
   b. For example, echo '1 2 3 4 5' | awk '{$(NF+1)=$1; for (i=1;i<NF;i++)
   { OFS="<"i">"; $i=$(i+1); print }; NF--; print }'

*My current belief*:

Of course, I could be wrong, but I currently believe the expert is
mistaken. #1 is easily testable as is #2. #3a, if it ever was true, has not
been true in decades: setting a field merely sets a flag that $0 needs to
be rebuilt. #3b is incorrect because there are two cases in gawk
<https://git.savannah.gnu.org/gitweb/?p=gawk.git&a=search&h=HEAD&st=grep&s=rebuild_record>
where $0 is rebuilt: when OFS is set and when $0 is read, both of which the
expert’s example does; that is, it introduces the very problem it is
supposed to be detecting.

*What now?*

I have gone to lengths to find flaws in my code. I don’t want to overwhelm
people with details that they may not even be interested in. I think
everyone can see for themselves that it works correctly, but is there
something more I could do to demonstrate that? Should I bother explaining
in detail the flaws in the expert's test code? Do people want to see timing
tests showing that this method is not slow? Would it help to demonstrate
that it is as fast or faster than the commonly seen (and incorrect)
methods, such as k=$1; $1=""; $0 = $0 k? Even though this is a gawk
specific question, should I show that my code works even on the oldest
versions of AWK still in use (e.g., MacOS’s 2007 version of Brian
Kernighan’s *One True Awk*)? Would pointing to how $0 is rebuilt in the
gawk source code be useful? Do people want a patch to the current gawk git
which outputs how many times $0 has been rebuilt when it exits?

What am I missing and what more can I do?

Thank you,

—b9


reply via email to

[Prev in Thread] Current Thread [Next in Thread]