monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Re: suturing, resurrection, renaming and unique fil


From: William Uther
Subject: Re: [Monotone-devel] Re: suturing, resurrection, renaming and unique file identifiers
Date: Thu, 29 May 2008 15:25:37 +1000

Mikkel,

  I haven't seen any replies to your suggestion yet.

Conceptually, removing diediedie merge isn't an issue. The issue is doing it in a computationally efficient way. Simply adding new IDs doesn't change this.

In general, the whole point of fileIDs is that it is a constant time operation to see if two of them are the same. If you have to search some history graph, then things are already significantly less efficient. Having said that, I'm not up on bloom filters (well, I just read the wikipedia entry).

With file suturing, we are looking to implement a mechanism whereby we can handle multiple node-ids for a single conceptual file. I was thinking some form of http://en.wikipedia.org/wiki/Disjoint-set_data_structure was what we needed. A bloom filter might work, but the fact that it is a stochastic data structure worries me. You'd either need to have a large number of redundant hashes to reduce the probability significantly, or you'd need to have some double-check after matching.

I must say I'm a little unclear about your other suggestions. I can see that they might work, but would seem to be horribly inefficient compared to what monotone currently has. I may be wrong about that, but I think you'd need to show your method efficient (less than a large constant times O(n) for matching n node-ids between two revisions, and less than O(nr) space where n is the number of node ids and r is the number of revisions being stored).

Cheers,

Will        :-}

On 29/05/2008, at 3:07 AM, Mikkel Fahnøe Jørgensen wrote:

2008/5/27 Mikkel Fahnøe Jørgensen <address@hidden>:
When a change is merged to a target revision, we can find
the merge candidate not by comparing file id's but by finding common
history in sliding the change forward along the path of file id
changes until we reach one or more files in the current merge target
revision.

I should perhaps clarify:

This is not simply sliding forward along a path. First a common
ancestor must be identified between files in the two revisions.
This ancestor is useful for merging, but the purpose here is to
establish a common identity between two files.

If we only have a single unique file id, we only need the common
ancestor due to merging since we can directly establish that the file
is present in both revisions.

In the general case with progressive file id's, we have a routing problem:
For each file in one revision we must find a path to one or more files
in the other revision.
If we find no such path, we introduce a new file, or try to join
existing files based on user input, path names and heuristics, or
report a conflict.

An idea for optimizing such routing could be to use a bloom filter
(known from text indexing and dictionaries) that tracks all id's of
all past file revisions. When can then trace back a file id from a
parallel branch until the bloom filter matches an ancestor file id,
which is likely a common ancestor (bloom filters have false positives
but never false negatives). But there could well be better ways to
pre-calculate relevant routing information.

Once we have the common ancestor, the remaining problem is sliding
forward. But we must eliminate those forward traces that does not lead
to our desired revision, hence it may be better to view the problem
fully as a routing problem instead of seeking ancestors.

Mikkel
_______________________________________________
Monotone-devel mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/monotone-devel





reply via email to

[Prev in Thread] Current Thread [Next in Thread]