bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] awk big file dead loop


From: Aharon Robbins
Subject: Re: [bug-gawk] awk big file dead loop
Date: Tue, 11 Nov 2014 20:50:27 +0200
User-agent: Heirloom mailx 12.5 6/20/10

Hello. Re this:

> Date: Fri, 31 Oct 2014 17:21:58 +0000
> From: dragan legic <address@hidden>
> To: address@hidden
> Subject: [bug-gawk] awk big file dead loop
> Status: R
>
> Program-utility works properly till the output file reaches a certain
> file size, at which point it blocks up the memory.

Although I wasn't able to get your file from you, my own attempts to
reproduce this showed me that while gawk did not leak memory, it's memory
use did increase linearly in the input.  That is, the more input records
read, the more memory allocated and freed.  For the program you were using
(sent in private mail):

        gawk '!seen[$0]++' file

that should not have been happening.

I was able fairly easily to find the cause, but a solution took me
a while.  Here is the fix; I will be pushing this to git soon. The patch
should apply to 4.1.1 without much problem, although I haven't tested that.

Thanks!

Arnold
---------------------------------
diff --git a/field.c b/field.c
index 4819ea9..7b4f219 100644
--- a/field.c
+++ b/field.c
@@ -277,6 +277,12 @@ set_record(const char *buf, int cnt)
        /* copy the data */
        memcpy(databuf, buf, cnt);
 
+       /*
+        * Add terminating '\0' so that C library routines 
+        * will know when to stop.
+        */
+       databuf[cnt] = '\0';
+
        /* manage field 0: */
        unref(fields_arr[0]);
        getnode(n);
diff --git a/interpret.h b/interpret.h
index 2880433..593f11a 100644
--- a/interpret.h
+++ b/interpret.h
@@ -340,7 +340,12 @@ uninitialized_scalar:
                        lhs = r_get_field(t1, (Func_ptr *) 0, true);
                        decr_sp();
                        DEREF(t1);
-                       r = dupnode(*lhs);     /* can't use UPREF here */
+                       /* only for $0, up ref count */
+                       if (*lhs == fields_arr[0]) {
+                               r = *lhs;
+                               UPREF(r);
+                       } else
+                               r = dupnode(*lhs);
                        PUSH(r);
                        break;
 
@@ -649,11 +654,22 @@ mod:
                        lhs = get_lhs(pc->memory, false);
                        unref(*lhs);
                        r = pc->initval;        /* constant initializer */
-                       if (r == NULL)
-                               *lhs = POP_SCALAR();
-                       else {
+                       if (r != NULL) {
                                UPREF(r);
                                *lhs = r;
+                       } else {
+                               r = POP_SCALAR();
+
+                               /* if was a field, turn it into a var */
+                               if ((r->flags & FIELD) == 0) {
+                                       *lhs = r;
+                               } else if (r->valref == 1) {
+                                       r->flags &= ~FIELD;
+                                       *lhs = r;
+                               } else {
+                                       *lhs = dupnode(r);
+                                       DEREF(r);
+                               }
                        }
                        break;
 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]