[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Monotone-devel] Roster handling in the code
From: |
Stephen Leake |
Subject: |
Re: [Monotone-devel] Roster handling in the code |
Date: |
Sat, 25 Jul 2009 05:56:22 -0400 |
User-agent: |
Gnus/5.11 (Gnus v5.11) Emacs/22.2 (windows-nt) |
Patrick Georgi <address@hidden> writes:
> Am 25.07.2009 04:07, schrieb Stephen Leake:
>> Patrick Georgi<address@hidden> writes:
>>
>>> The current code takes two rosters, compares them, and registers the
>>> differences. The problem is that this approach scales by roster size
>>> (which is directly dependent on the tree size). The problem is, that I'm
>>> working with trees with 50000 files - comparing all of their entries
>>> just to find the 5 that actually changed is quite a waste of time.
>>> As a test, I disabled the roster deltification code, and got quite some
>>> speedup (pushing 10 revisions with put_revision in approx. half the
>>> time), so there's definitely room for improvement.
>>>
>> I think the main motivation for roster_deltas is database size. If we
>> can store a new revision as an old revision plus a small delta, that's
>> a significant size saving. As with anything that saves size, it costs
>> computation time. Which doesn't mean the current trade-off is optimal.
>> Disk space is certainly cheaper than it used to be. But so is CPU time
>> :).
>> Perhaps a user tuning option could be provided?
>>
> The problem in this case, that large trees affect both of them very
> negatively. When I did the profiling, I disabled roster_delta
> generation (#if 0/#endif around a couple of lines in database.cc) to
> see if I'm really on the right track.
> But this "optimization", while reasonable for small trees, really
> hurts large trees, too. Every roster contains information about 50000
> files, and that sums up.
Which suggests a possible space savings; roster_delta could compute the
change in the manifest as well, so each revision doesn't need to store
the full manifest.
Hmm. I don't think the manifests are actually stored anywhere.
Apparently they used to be, but not anymore. So it looks like that
optimization has already been done.
> Yes, but look how it's done. It's in roster_delta.cc,
> make_roster_delta_t(...):
> It creates a datastructure containing information about _all_ files in
> the roster (ie. all 50000 in my case), and then loops over them and
> notes the differences.
Ok; that's 50000 iterations. I just wouldn't call them "lookups",
since they don't fetch stuff from the database unless there is a change.
> The cset already contains a list of all changes between the two
> revisions, so the code could walk a changeset to figure out which
> roster entries to handle. As most changesets are rather small, that
> should give a nice speed up.
Yes, that makes sense.
--
-- Stephe