bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Memory usage of multi-dimensional arrays


From: EDGAR, ADAM
Subject: Re: [bug-gawk] Memory usage of multi-dimensional arrays
Date: Thu, 22 Mar 2018 19:03:34 +0000

4k is actually very small for a hash table. You need to allocate space for 
pointers in the very least. 8 bytes per pointer means that the hash table would 
initially have just 512 slots.  The chances at collision and chaining start 
slowing down a hash at around 40% utilization as I recall. So to avoid many 
reallocs you go ahead and allocate enough space to handle 200+ values 
efficiently.

ASE

On 3/22/18, 1:34 PM, "bug-gawk on behalf of address@hidden" <address@hidden on 
behalf of address@hidden> wrote:

    A hash table with a single element in it shouldn't take up 4K...
    
    Wolfgang Laun <address@hidden> wrote:
    
    > Why surprised? A hash table allocating initially just 4k is rather
    > conservative. Or does the nesting level influence the allocation quantity?
    >
    > I admit that I've been answering this question based on general 
experience,
    > but I can't imagine a better strategy for a sound implementation.
    >
    > -W
    >
    > On 22 March 2018 at 17:22, <address@hidden> wrote:
    >
    > > Thanks for the followup; sorry I didn't reply earlier. I'm suprised
    > > that each array takes 4 KB; if I have some time I will try to figure
    > > out why.
    > >
    > > Arnold
    > >
    > > Wolfgang Laun <address@hidden> wrote:
    > >
    > > > You are welcome.
    > > >
    > > > I might have added that there shouldn't be any additional memory
    > > > consumption with the 2nd, 3rd,... array element for each row, since 
these
    > > > will be hashed into the 4kB area reserved when a row is allocated. 
Only
    > > > when the hash table reaches a certain limit you will observe another
    > > > significant increase. Details depend on the implementation. You might
    > > check
    > > > the literature on hash tables in general if you aren't familiar with 
this
    > > > technique.
    > > >
    > > > -W
    > > >
    > > >
    > > >
    > > >
    > > > On 22 March 2018 at 16:10, Christian Schneider <address@hidden>
    > > > wrote:
    > > >
    > > > > Hi Wolfgang,
    > > > >
    > > > > Thank you very much for the clarification.  To be honest, without 
your
    > > > > extra explanations, I would have probably not understood the quote.
    > > > >
    > > > > Best,
    > > > > Christian
    > > > >
    > > > > On Fri, 2018-03-16 00:27:32 (-0700), Wolfgang Laun wrote:
    > > > > > It does say something about the behaviour: "...implementations, 
which
    > > > > > *typically
    > > > > > use hash tables* to store array elements and values."
    > > > > >
    > > > > > This means that in the twodimensional case you have 100000 hash
    > > tables.
    > > > > >
    > > > > > It would still be possible to create a big two-dimensional array 
but
    > > > > > swapping would set in and it would become very slow.
    > > > > >
    > > > > > No, there is no way of making that behaviour more efficient.
    > > > > >
    > > > > > Either map two index values to a single value using a function, or
    > > you
    > > > > are
    > > > > > using the wrong tool for your task.
    > > > > >
    > > > > > Cheers
    > > > > > Wolfgang
    > > > > >
    > > > > > On 16 March 2018 at 02:23, Christian Schneider <
    > > address@hidden>
    > > > > > wrote:
    > > > > >
    > > > > > > Hi all,
    > > > > > >
    > > > > > > I encountered an interesting behaviour with multi-dimensional
    > > arrays
    > > > > and
    > > > > > > was wondering, if this is expected or a bug:
    > > > > > >
    > > > > > > Example 1:
    > > > > > >
    > > > > > > ## create array with 100k elements
    > > > > > > BEGIN { for (I = 0; I < 100000; I++) X[I] = 0 }
    > > > > > > ## wait to allow for memory analysis
    > > > > > > END { while (1 == 1) Y = 0 }
    > > > > > >
    > > > > > > Result:
    > > > > > >
    > > > > > > The memory usage (looked at with "ps" and all its limitations) 
is
    > > of
    > > > > > > order a few (~8) bytes per element, as expected.
    > > > > > >
    > > > > > > Example 2:
    > > > > > >
    > > > > > > ## create multi-dimensional array with 100k elements
    > > > > > > BEGIN { for (I = 0; I < 100000; I++) X[I][I] = 0 }
    > > > > > > ## wait to allow for memory analysis
    > > > > > > END { while (1 == 1) Y = 0 }
    > > > > > >
    > > > > > > Result:
    > > > > > >
    > > > > > > Uses a few (~4) kB per element.  This also means, an array with 
1m
    > > > > > > elements cannot even be created on a machine with 8 GB RAM 
anymore.
    > > > > > >
    > > > > > > I could not find any documentation on that behaviour.  If it is
    > > > > > > considered "normal", could you mention this somewhere, please?  
Is
    > > > > there
    > > > > > > a way to make these arrays more efficient?
    > > > > > >
    > > > > > > Thank you very much for any comments.
    > > > > > >
    > > > > > > Best regards,
    > > > > > > Christian
    > > > > > >
    > > > > > > P.S.: Please CC me, as I am not subscribed.
    > > > > > >
    > > > > > > P.P.S.: version:
    > > > > > > GNU Awk 4.1.4, API: 1.1 (GNU MPFR 3.1.5, GNU MP 6.1.2)
    > > > > > > from: Debian 9.4, amd64
    > > > > > >
    > > > > > >
    > > > >
    > >
    
    


reply via email to

[Prev in Thread] Current Thread [Next in Thread]