monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] some observations about roster node IDs


From: Nathaniel Smith
Subject: Re: [Monotone-devel] some observations about roster node IDs
Date: Thu, 28 Jun 2007 22:50:00 -0700
User-agent: Mutt/1.5.13 (2006-08-11)

On Thu, Jun 28, 2007 at 09:56:24PM -0700, Zack Weinberg wrote:
> On 6/28/07, Nathaniel Smith <address@hidden> wrote:
> >You may want to double-check your numbers on a workload you're really
> >interested in, like 'pull' -- 'regenerate_caches', unless someone has
> >gone and fixed it when I wasn't looking, uses an arbitrary toposort.
> >Normally for things like pull we process revisions using our smart
> >height-based toposort, which has nice clustering behavior.  This means
> >that often (anyone want to measure?) when we want to load a roster,
> >its immediate parent is already in the roster cache.
> 
> Unfortunately, oprofile is still not working for me (I tried your
> suggestion but it didn't help any) and I suspect the server side of a
> pull will not appreciate the client running under cachegrind. I'll see
> what I can do with boring old gprof...

Blah.  I assume you've also checked that you're not using one of the
recent kernels that broke oprofile.

IME the server side of a pull is fine with the client being run under
cachegrind, though.

> >Where are these dynamic_cast's coming from?  Roster loading seems like
> >a plausible source, since loading a roster requires doing many many
> >path lookups, and doing a path lookup requires doing dynamic_cast's on
> >the node_t's corresponding to each intermediate directory.  Roster
> >writing similarly requires O(tree) dynamic_casts, and poor cache
> >behavior for roster writing means that we'll be writing too many full
> >rosters (as dirty rosters get evicted, only to later be re-loaded and
> >deltafied).
> 
> Yeah, it's all path lookup, roster writeout, and deltificiation.
> 
> >> We are currently doing node ID lookups with a hybrid_map, which is
> >> both a STL map (i.e. red-black tree) and a hashtable.  Lookups do not
> >> show up in my profiles, but I suspect they would for other workloads.
> >
> >What workloads are you thinking of?  It seems plausible to me that
> >they would, but it also seems plausible that things like
> >pull or regenerate_rosters are a worst-case, given that they are doing
> >whole-roster operations over and over in their inner loop.
> 
> Pull, checkout, large updates, merge were what I was thinking of.
> unify_rosters, but I don't know what hits that.

large pulls in the limit are the same as regenerate_rosters, modulo
the stuff I mentioned about caches.  unify_rosters is used by
make_roster_for_merge_revision, which is mostly called by
pull/regenerate_rosters (though also called once for all workspace
operations in a merge workspace).  Update/merge/other workspace
operations only have to touch one or two rosters, though, as opposed
to 10,000, so they're generally much less sensitive to the speed of
the roster code itself.  (Esp. for any workspace operation, workspace
scanning is going to dominate any piddly hash-table lookups.)

-- Nathaniel

-- 
Details are all that matters; God dwells there, and you never get to
see Him if you don't struggle to get them right. -- Stephen Jay Gould




reply via email to

[Prev in Thread] Current Thread [Next in Thread]