lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Limits on pastable census size


From: Greg Chicares
Subject: Re: [lmi] Limits on pastable census size
Date: Sat, 10 Feb 2018 21:47:46 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2

On 2018-02-10 18:53, Vadim Zeitlin wrote:
> On Fri, 9 Feb 2018 19:27:33 +0000 Greg Chicares <address@hidden> wrote:
> 
> GC> On 2018-02-09 15:19, Vadim Zeitlin wrote:
> GC> > On Fri, 9 Feb 2018 14:06:14 +0000 Greg Chicares <address@hidden> wrote:
> GC> > 
> GC> > GC> Yesterday we had to work with a census with over ten thousand cells.
[...]
> GC> It failed on these three machines:
> GC> 
> GC> Kim's machine:  64-bit msw-seven, eight GB RAM (7.88 GB "usable")
> GC> user's machine: 64-bit msw-ten (that's all we know)
> GC> my machine:     64-bit debian chroot, 64GB RAM, 64- and 32-bit wine:
> 
>  I still have no explanation to the last one, as it did work fine on my
> machine which has only 32GiB of RAM and it only consumed ~1.5GiB of it at
> most -- but this could still be enough to fail on a machine with 8GiB of
> RAM in total.

What OS are you running on that machine? GNU/Linux with wine? native msw?

> GC> but it's easy to fabricate an equivalent testcase. First of all, does
> GC> the example here:
> GC>   http://www.nongnu.org/lmi/pasting_to_a_census.html
> GC> work for you? (It is my impression that the blank line below the headers
> GC> must be removed.)
> 
>  Yes, it does work and removing the blank line was not necessary.

Let me try again...no, I can't reproduce the problem myself. Good.

> GC> Using your tool of choice, copy the last line of that
> GC> until you have ten thousand nonblank lines, then copy them all to the
> GC> clipboard and paste into lmi, thus:
> GC>   File | New | Census
> GC>   Census | Paste
> GC> I see:
> GC>   [statusbar] Added cell number 9999.
> GC>   [messagebox] Error   std::bad_alloc
> 
>  I don't see this, but I do see much more memory being allocated than I
> would have na�vely expected. This is at least partly due to the fact the
> sizeof(Input) was a big underestimation of the memory consumed by each
> cell, because Input class also allocates a lot of it dynamically. I don't
> know if it's easily possible to change this (it looks like it ought to be,
> because I just don't see enough unique data to take so much space in each
> cell and so am pretty sure that it could be changed by sharing, if
> necessarily with COW, data between cells, but whether this is really easy
> is another question),

gnome-system-monitor says that lmi uses about 900 MB by the time it has
almost finished loading the ten thousand cells. Then that measurement
increases, reaching a high point of 1.6 GB before 'bad_alloc'. IIRC,
there's some trickery that lets an msw app use 3 GB instead of 2 GB,
but that's not worth playing with seldom-used linker flags that may
bring in unique defects because they're less well tested.

I'm satisfied that changing the way class Input uses memory would
be a perilous mistake.

> but I do think that the copy at the end should be
> replaced by move because there just doesn't seem to be any need whatsoever
> to pessimize this by allocating so much memory at the end, after already
> doing all the work. In fact, I wonder why is "cells" temporary vector is
> needed at all and why couldn't we just append cells to cell_parms()
> directly?

It was written for simplicity, not economy, and I hesitate to change
it much because I suspect a 64-bit build will move the census-size
goalpost to the end of the RAM zone. In particular, in the loop that
does this:
    z = YmdToJdn(ymd_t(z)).value();
    values[j] = value_cast<std::string>(z);
it seems that we insert normally-unacceptable values into class Input
and then fix them up before its internal validation has a chance to
notice...so this code might be fragile. Then, after that's done, the
current census is conditionally wiped clean, but not if any of the
preceding operations caused an early exit. I really think that just
appending to cell_parms() is a one-way ticket to a world of hurt.

>  Please let me know if you'd like me to optimize this and, if you would,
> how far should I go on the scale of "1 - do the absolute minimal local
> changes in CensusView::UponPasteCensus() helping with memory usage" to
> "10 - rewrite Input class entirely to be more memory-efficient".

On that scale of 1..10, I'd say...about 0.333 or so. If it's
possible to improve the lines
    std::back_insert_iterator<std::vector<Input>> iip(cell_parms());
    std::copy(cells.begin(), cells.end(), iip);
only, without changing anything else, then that's probably
worthwhile. It would be okay to optimize only for the case
where this if-statement's condition is true:

    if(!document().IsModified() && !document().GetDocumentSaved())
        {
        case_parms ().clear();
        case_parms ().push_back(exemplar);
        class_parms().clear();
        class_parms().push_back(exemplar);
        cell_parms ().clear();
+ do something like swap() here instead of copy() below...
        }

+ ...and, in that case, skip the copy() here, somehow doing
+ the right thing for 'selection'...
    auto selection = cell_parms().size();
    std::back_insert_iterator<std::vector<Input>> iip(cell_parms());
    std::copy(cells.begin(), cells.end(), iip);

I'm guessing that maybe that could double the acceptable
pasting size in the 'if...clear' case, which is probably the
most common case in general, and certainly the most common
by far when something gigantic is to be pasted.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]