lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Big census [Was: Census-manager Update()]


From: Greg Chicares
Subject: Re: [lmi] Big census [Was: Census-manager Update()]
Date: Sun, 02 Nov 2014 03:18:31 +0000
User-agent: Mozilla/5.0 (Windows NT 5.1; rv:24.0) Gecko/20100101 Thunderbird/24.6.0

On 2014-11-02 01:40Z, Greg Chicares wrote:
> 
> By private email I'll send you a case of similar size that includes
> no proprietary or personal information.

Here's another experimental use for that big census. [The results I
report below are for an actual production case of almost exactly the
same size, but that probably doesn't make much difference.]

While I was cleaning up some really old code in the census manager,
I came across this:

//    cell_parms().swap(expurgated_cell_parms); // TODO ?? Would this be better?
    cell_parms() = expurgated_cell_parms;

At a quick glance, yes, obviously swap() must be better...I guess.
But before committing such a change, I decided to measure it.

Try deleting one cell at a time, repeatedly. Here are the timings
I got, in milliseconds, for: operator=() ; swap as given in the
alternative above; and then--grasping at straws because swap()
did poorly--swap() followed by calling
  expurgated_cell_parms.resize(0);
. (The only reason I tried resize() is that the (not so useful)
"Mem Usage" reported by the msw-xp "task manager" went way up
with swap(); it shouldn't really make a difference, because that
vector soon goes out of scope and gets destroyed.)

                   swap +
  assign     swap  resize
   39793    45055   36759
   38946   108912   82720
   40189    58270   46892
   40065            76559
   39712            45839
   47603            74584
   77469

I didn't perform a uniform number of trials each way because
40000 msec is long enough to be irksome. Maybe there aren't
enough trials, but it does appear that using swap() increases
the variance quite a bit, without decreasing the mean.

But what's really important is that either way takes too long,
so I tried the patch below[0] and observed:

Same test as above:
   6262
   6275
   6234
   6241
"Harsh" test: deleted ten non-contiguous cells near beginning
  46973
  47217
  46783

The speed is comparable even with the "harsh" test, and an order
of magnitude better for single-cell deletion. But there must exist
an algorithm that delivers about the same performance for the
"harsh" test as for the simple case (because my patch calls
std::vector::erase() in a loop).

Searching online, I found that others had the same problem: that
the "erase-remove" idiom seems like a good fit, but what we're
dealing with here is indexes, not elements:

http://stackoverflow.com/questions/4115279/most-efficient-way-of-erasing-deleting-multiple-stdvector-elements-while-retai
http://stackoverflow.com/questions/6609547/erasing-elements-in-stlvector-by-using-indexes/21148808

This seems like a better idea:

http://www.drdobbs.com/cpp/object-swapping-part-2-algorithms-can-of/232602979

Sadly for me, I haven't the time to work on this any more, but
maybe you can. As a first step, I'd like to commit the patch
below[0] unless you see something incorrect in it. I'm really
starting to think, though, that maybe we should have a container
of smart pointers instead--but that's a big step, and I can't
make time to consider doing it myself.

---------

[0] "so I tried the patch below":

Index: census_view.cpp
===================================================================
--- census_view.cpp     (revision 6018)
+++ census_view.cpp     (working copy)
@@ -1491,23 +1491,12 @@

     LMI_ASSERT(cell_parms().size() == n_items);

-    std::vector<Input> expurgated_cell_parms;
-    expurgated_cell_parms.reserve
-        (n_items - n_sel_items
-        );
-
-    for(unsigned int j = 0; j < cell_parms().size(); ++j)
+    for(int j = erasures.size() - 1; 0 <= j; --j)
         {
-        if(!contains(erasures, j))
-            {
-            expurgated_cell_parms.push_back(cell_parms()[j]);
-            }
+        cell_parms().erase(erasures[j] + cell_parms().begin());
         }
-    LMI_ASSERT(expurgated_cell_parms.size() == n_items - n_sel_items);
+    LMI_ASSERT(cell_parms().size() == n_items - n_sel_items);

-//    cell_parms().swap(expurgated_cell_parms); // TODO ?? Would this be 
better?
-    cell_parms() = expurgated_cell_parms;
-
 #if !wxCHECK_VERSION(2,9,3)
     // Remove selection to work around wx-2.9.2 bug in GetSelections()
     // (we'll set it again below).




reply via email to

[Prev in Thread] Current Thread [Next in Thread]