monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] some observations about roster node IDs


From: Nathaniel Smith
Subject: Re: [Monotone-devel] some observations about roster node IDs
Date: Thu, 28 Jun 2007 20:15:02 -0700
User-agent: Mutt/1.5.13 (2006-08-11)

On Thu, Jun 28, 2007 at 03:07:47PM -0700, Zack Weinberg wrote:
> Performance testing on the file_path/split_path changes led me to
> thinking about node_id lookups.  Some things I've noticed:
> 
> regenerate_caches (on a monotone database) spends nearly seven percent
> of runtime in dynamic_cast<> implementation, going from node_t to
> dir_t and/or file_t.

You may want to double-check your numbers on a workload you're really
interested in, like 'pull' -- 'regenerate_caches', unless someone has
gone and fixed it when I wasn't looking, uses an arbitrary toposort.
Normally for things like pull we process revisions using our smart
height-based toposort, which has nice clustering behavior.  This means
that often (anyone want to measure?) when we want to load a roster,
its immediate parent is already in the roster cache.

We could do even better here if we wanted, if instead of relying on
the roster cache's LRU we kept explicit track of how often a roster
would be needed for the operation at hand (e.g., in pull or
regenerate_caches, a roster will be needed exactly once for each child
revision), and pinned rosters in memory for exactly the right amount
of time.

Where are these dynamic_cast's coming from?  Roster loading seems like
a plausible source, since loading a roster requires doing many many
path lookups, and doing a path lookup requires doing dynamic_cast's on
the node_t's corresponding to each intermediate directory.  Roster
writing similarly requires O(tree) dynamic_casts, and poor cache
behavior for roster writing means that we'll be writing too many full
rosters (as dirty rosters get evicted, only to later be re-loaded and
deltafied).

> We are currently doing node ID lookups with a hybrid_map, which is
> both a STL map (i.e. red-black tree) and a hashtable.  Lookups do not
> show up in my profiles, but I suspect they would for other workloads.

What workloads are you thinking of?  It seems plausible to me that
they would, but it also seems plausible that things like
pull or regenerate_rosters are a worst-case, given that they are doing
whole-roster operations over and over in their inner loop.

> * It would not be totally crazy to use two vectors for the node ID ->
> node_t map: one for true node IDs, one for temp node IDs.  (There are
> no temp node IDs in any of the graphs, of course.  Temp node IDs start
> at 2^31, so clearly we cannot use one vector for both.)  The
> true-node-ID vector would have a lot of unused entries, but I suspect
> its memory usage would actually be better than what we are currently
> doing, and it doesn't cost too much to skip 3000 unused slots in an
> iteration over an 8000-element vector.  One does worry about scaling.
> Some sort of sparse array structure might work better.

Do you know how much memory we're losing to the hash table as compared
to a dense vector?  Any idea how it compares to the memory we're
losing to, say, malloc overhead from the shared_ptr/std::string-happy
structures that the hash table is storing?  (I have no idea how to
measure either of these quantities, I wish I did.
database::roster_size_estimator::operator() is totally awesome.)

> * We should seriously consider separating dir from file nodes so we're
> not blowing so much time in dynamic_cast.  I made a stab at this
> locally, it gets messy but not, I think, *too* messy.
> 
> * We might want to think about packing node IDs better.  I can think
> of several algorithms of varying degrees of cleverness, but suspect
> someone else has a better sense of how to do it.

Not sure what you mean here.  Within a database, node_ids are
perfectly packed -- we use every node_id from 1 up to the current
maximum.  We might arrange to make the space used by any given roster
less fragmented, I guess -- the obvious way that comes to mind would
be to process revisions in a clustered-toposorty sort of way so that
all the additions within one branch of development get adjacent parts
of the node_id space :-).  If that's true, you might want to run your
scripts on a database after doing a local fresh pull, as opposed to a
regenerate_rosters.

-- Nathaniel

-- 
Eternity is very long, especially towards the end.
  -- Woody Allen




reply via email to

[Prev in Thread] Current Thread [Next in Thread]