Re: [Monotone-devel] Roster handling in the code

monotone-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Roster handling in the code

From:	Patrick Georgi
Subject:	Re: [Monotone-devel] Roster handling in the code
Date:	Sat, 25 Jul 2009 09:39:26 +0200
User-agent:	Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.1) Gecko/20090715 Thunderbird/3.0b3

Am 25.07.2009 04:07, schrieb Stephen Leake:

Patrick Georgi<address@hidden>  writes:

The current code takes two rosters, compares them, and registers the
differences. The problem is that this approach scales by roster size
(which is directly dependent on the tree size). The problem is, that I'm
working with trees with 50000 files - comparing all of their entries
just to find the 5 that actually changed is quite a waste of time.
As a test, I disabled the roster deltification code, and got quite some
speedup (pushing 10 revisions with put_revision in approx. half the
time), so there's definitely room for improvement.

I think the main motivation for roster_deltas is database size. If we
can store a new revision as an old revision plus a small delta, that's
a significant size saving. As with anything that saves size, it costs
computation time. Which doesn't mean the current trade-off is optimal.
Disk space is certainly cheaper than it used to be. But so is CPU time
:).
Perhaps a user tuning option could be provided?

The problem in this case, that large trees affect both of them verynegatively. When I did the profiling, I disabled roster_delta generation(#if 0/#endif around a couple of lines in database.cc) to see if I'mreally on the right track.But this "optimization", while reasonable for small trees, really hurtslarge trees, too. Every roster contains information about 50000 files,and that sums up.

So both potential optimizations, the current one, and disablingroster_deltas, are acceptable for small trees. But small trees are notthe case I'm looking at.

...
My current approach is to try to create the roster_delta while the new
roster is created out of the old (using
cset.apply_to(editable_roster_for_{non,}merge)). The most appropriate
place for this seems to be editable_roster_for_{non,}merge, as all data
is passed to it as some point of the operation.

I have wondered why there were two different "delta" mechanisms; one
that creates changesets, and one that creates roster deltas. I thought
at one time that one could be eliminated. But then I realized both are
needed. I'm not sure I can remember the details, though :(.

I think at the time where changesets are created and written to thedatabase, some data that's stored in rosters (and thus roster_deltas)isn't around yet, for example node ids.

It could be that combining them would lead to a time savings.

The ability to generate one out of the other, given the right set ofsource data, might be enough. As changesets are managed first, myattempt is to efficiently generate roster_deltas out of changesets plusthe related rosters.

But I'm looking for a more direct approach: Is there a way to get figure
out the revision that matches a roster?

Rosters are fetched in the first place by revision id, so I'm not sure
what you are asking.

I was deep inside the code, and at some point, it was only shufflingrosters, with no revision ids in sight. I just looked a couple of levelsfurther up, and everything needed seems to be there.

For example:

   db.get_roster(left_rid, left_roster, left_marking_map);
   db.get_roster(right_rid, right_roster, right_marking_map);

This is from the merge code. If you could point to the section of the
sync code you are worried about, it might help.

This gave me an idea: database::put_roster handles the deltification ofrosters. The revision ids are available there, so I could hand over thechangesets to the delta_rosters(...) call there. See below.

5 lookups instead of 50000.

Computing a roster delta should not involve retrieving the file text
for every file; doesn't it check if the file id (hash of the contents)
has changed first? Those are given in the roster, so comparing them
doesn't count as a "lookup".

Yes, but look how it's done. It's in roster_delta.cc,make_roster_delta_t(...):It creates a datastructure containing information about _all_ files inthe roster (ie. all 50000 in my case), and then loops over them andnotes the differences.The cset already contains a list of all changes between the tworevisions, so the code could walk a changeset to figure out which rosterentries to handle. As most changesets are rather small, that should givea nice speed up.

What are you defining as a "lookup"?

This iterator is used to look up the changes: parallel::iter<node_map>i(from.all_nodes(), to.all_nodes());



Regards,
Patrick

[Prev in Thread]

Current Thread

[Next in Thread]

[Monotone-devel] Roster handling in the code, Patrick Georgi, 2009/07/24
- Re: [Monotone-devel] Roster handling in the code, Stephen Leake, 2009/07/24
  - Re: [Monotone-devel] Roster handling in the code, Patrick Georgi <=
    - Re: [Monotone-devel] Roster handling in the code, Nathaniel Smith, 2009/07/25
    - Re: [Monotone-devel] Roster handling in the code, Stephen Leake, 2009/07/25

Prev by Date: [Monotone-devel] resolving workspace conflicts
Next by Date: Re: [Monotone-devel] Roster handling in the code
Previous by thread: Re: [Monotone-devel] Roster handling in the code
Next by thread: Re: [Monotone-devel] Roster handling in the code
Index(es):
- Date
- Thread