#
# patch "ChangeLog"
# from [461c25a92640c41f5d01de41ee652f4ba900980b]
# to [0b4e42f08d53fd7189d2c51f2069be4aa1a9952c]
#
# patch "cset.cc"
# from [1fb425ad8e367a70b3cd9668c69bcf37b4d4418e]
# to [a388ba73b60d8b2b4f9617594f5bc770356bb08e]
#
===============================================
--- ChangeLog 461c25a92640c41f5d01de41ee652f4ba900980b
+++ ChangeLog 0b4e42f08d53fd7189d2c51f2069be4aa1a9952c
@@ -1,3 +1,7 @@
+2005-07-16 graydon hoare
+
+ * cset.cc: Bugfixing, copy-on-write, attribute support.
+
2005-06-19 graydon hoare
* cset.cc: Rewrite-in-progress on change_set.cc.
===============================================
--- cset.cc 1fb425ad8e367a70b3cd9668c69bcf37b4d4418e
+++ cset.cc a388ba73b60d8b2b4f9617594f5bc770356bb08e
@@ -1,158 +1,175 @@
/*
- an mfest is a collection of nodes, indexed by ident number and
- also held in a tree of nodes. the bucket of identified nodes is
- possibly larger than the tree of nodes.
+nodes:
+~~~~~~
- a change_set is a pair of mfests. every entry in each mfest exists in
- the other, though some might be detached from the file tree and just
- known by ident.
+ a node is either a file or a directory. nodes have attributes, an
+ ident, a parent-ident, and a set of heirs and sires. directory nodes
+ have a map of children and a map of clobbered-children. file nodes
+ have a content hash. see below for the definitions of these members.
+
+mfests:
+~~~~~~~
+
+ an mfest is an index-set of nodes X and a tree T of nodes starting
+ from a root. the index X maps ident numbers to shared pointers into
+ T. there may be entries in X which are not in T. T must always be a
+ well-formed tree.
+
+ an mfest has a normal form, in which:
+
+ - a node is in X iff it is in T
+ - all nodes in the tree have empty heirs and sires
+ (see below for definition of these)
+
+ serializing and re-reading an mfest normalizes it (though there is a
+ faster, in-place function to normalize them as well).
+
+
+csets:
+~~~~~~
+
+ a cset is a pair of mfests A, B. the mfests in a cset are *not*
+ normalized. it is an invariant that idents(A.X) = idents(B.X), but it
+ is only sometimes true that idents(A.T) = idents(B.T); some nodes
+ might be present in one mfest's tree but absent from the other's (if
+ they were added or deleted).
+
+
+change_consumers:
+~~~~~~~~~~~~~~~~~
+
a change_consumer -- and the serial form of a change_set -- is the
way a human wants to read about some work: organized into a set of
- deletes, moves, and adds.
+ deletes, moves, adds, deltas, and attribute set/clear operations.
- there is ambiguity in a change_consumer though, regarding the
- resolution of names and the simultineity of operations. specifically,
- each entry in a change_set is either looked up in the pre state or
- post state mfest of a change. a delete is looked up in the pre
- state. a rename's source in the pre state. a rename's destination in
- the post state. an add is looked up in the post state.
+ there is ambiguity in a change_consumer regarding the resolution of
+ names and the simultineity of operations. specifically, each entry in
+ a change_set is either looked up in the pre state or post state mfest
+ of a change. a delete is looked up in the pre state. a rename's
+ source in the pre state. a rename's destination in the post state. an
+ add is looked up in the post state.
when playing back a change_set, there is a canonical order in which
- entries are emitted: {delete,rename,add}{file,dir}.
+ entries are emitted: set_heir, delete, rename, add, set_sire,
+ apply_delta, clear_attr, set_attr.
furthermore within each type of entry, the order in which the entries
are emitted is specified: all entries are emitted in lexicographic
order, which happens to induce topological order as well (A is always
emitted before A/B).
- crucially, deletes are ordered by *source space* and adds and renames
- are ordered by *destination space*. we replay these by walking the
- tree rooted in each mfest.
+ crucially, set_heir and delete are ordered by *source space*; rename,
+ add, set_sire, apply_delta, clear_attr, and set_attr are ordered by
+ *destination space*. we replay these by walking the tree rooted in
+ each mfest.
-*/
+heirs and sires:
+~~~~~~~~~~~~~~~~
-/*
+ nodes may have heirs or sires. only nodes being deleted in a cset may
+ have an heir; only nodes being added in a cset may have a sire. the
+ heir of a node is a target to send future content deltas and
+ attributes to; it is a "forwarding address" used in cases where
+ two files with separate histories are considered identical in a merge
+ and "sutured": one node is deleted, and it marks the other node as an
+ heir. the name of the heir is looked up in the new manifest.
- aliases, splits, joins, nursery, graveyard:
-
- each node has a set of alias nodes.
+ only attribute and content-merging passes care about heirs and sires.
+ they do not affect lifecycle decisions during merging.
- if N is a node and A is an alias node of N in the old_manifest of a
- cset C, we say that A split from N in C. if A is an alias of N in
- the new_manifest of C, we say that A joined N in C.
+ an added node A has a node S as sire in cset C iff there is a cset C'
+ in which A was being deleted with heir S, and C' = inverse(C). in
+ other words, sires exist *only* to preserve information about heirs
+ during cset inversion. there are no user-accessible primitives for
+ creating sire relationships.
- if N is not an alias, but actually present in a manifest, we say that N
- is *live* in that manifest.
- - if N is live in the old_manifest of C, we say N is live-in in C.
- - if N is live in the new_manifest of C, we say N is live-out in C.
+generation numbers:
+~~~~~~~~~~~~~~~~~~~
+
+ every node in an mfest has a generation number. this exists to
+ implement copy-on-write semantics. the cset has a "write_generation"
+ number representing the generation which nodes should be updated to
+ when written to in the new mfest. when you want a node for writing,
+ you request it from the new mfest and the node-finder walks down the tree
+ looking. any time it hits an intermediate node which is less than the
+ write_generation, it makes a copy of that node and replaces the owning
+ pointer before walking into it. this ensures that the path you modify
+ is a truly new object, not a pointer into to the old tree.
- two special nodes exist *always* in every manifest:
- 1. the nursery
- 2. the graveyard
+attach and detach lists:
+~~~~~~~~~~~~~~~~~~~~~~~~
- - if A was split from the nursery in C, we say that A was added in C.
- - if A was joined to the graveyard in C, we say that A was deleted in C.
+ this is a subtle point. read it until it's clear.
- for any nodes M, N in C, we must have a proper lifecycle nesting. that is:
+ csets contain an implicit order to their operations: a block of
+ set_heir actions, then deletes, renames, adds, set_sires, deltas, and
+ finally attribute changes. moreover as we've seen above, within each
+ block there is a topological order: renames landing on A/B happen
+ before renames landing on A/B/C, for example.
- - if M splits from N in C then N must be live-in in C
- - if M joins to N in C, then N must be live-out in C
+ there is still, however, an ordering hazard when you apply renames
+ "in-place", either to an mfest or a filesystem: it's possible that a
+ rename-target will land on a name which is *in the process* of being
+ moved elsewhere -- by another rename in the cset -- but has not yet
+ been moved due to the serial order chosen. consider this cset:
- this is normally trivially satisfied by splitting and joining from the
- nursery and graveyard, respectively.
+ rename_file "A/B"
+ to "P/Q"
- splits exist only in order to serve as the inverse of joins. joins
- exist for only one purpose: when merging two trees which have
- different (non-identified) files occupying the same name, and the
- user decides they *want* to identify the files (eg. 2-way merge). in
- this case we want to join one file to the other, to give future edits
- on the joiner a target to apply to.
+ rename_file "P/Q"
+ to "Y/Z"
- yes? hmm.. we want to avoid this:
- ---------------------------------
+ this is the order they will be emitted in. because these names exist
+ in the same cset, they are to be executed "simultaneously" in theory;
+ but programs execute sequentially, so naively we would perform the
+ first rename, then the second.
- bob makes a tree and syncs it with jane.
- bob adds file foo
- bob syncs with sid
- jane adds file foo
- jane syncs with bob
- jane merges with bob. jane's foo is not the same as bob's foo. jane's foo wins.
- sid makes a change to bob's foo.
- sid syncs with jane.
- sid's change is silently eaten.
-
-
+ suppose we executed them "sequentially". for the first rename we
+ would do this:
- a point: every file *does* have a GUID. the (rev,path) pair it was added in.
- or at least it ought to. the problem is you can add a file in two (rev,path)
- pairs, and then later decide you *wanted* them to behave -- for the sake of
- merging -- as though they are the same file. iow, behave like they had a
- common ancestor immediately before the states they were both added in.
+ - find the node A/B in the OLD mfest, its file ident is file_0.
+ - find file_0 in the NEW mfest, its parent is parent_0.
+ - detach B from parent_0 in the NEW mfest.
+ - find P in the NEW mfest, its ident is parent_1.
+ - attach file_0 to parent_1 under the name Q.
- so how do joins help here? simply this: when you are working on a merge
- between A and B, the first step is to decide which files will be live
- in the merge. to do this you take all the files which are live in A and
- live in B. then for each file you've decided will be live, you pull out
- a subgraph for it, containing all the nodes in which it's alive. then you
- add to this subgraph all the nodes *joined* to the live one. that's the
- central trick: you will be doing graph contraction on the joiner as well,
- so that you do not lose any of its deltas.
+ this last action will fail. it will fail because parent_1 already has
+ a child called Q in the second mfest (it's the source of the next
+ rename, and it hasn't been moved elsewhere yet). we the readers know
+ that by the time the cset is finished applying, Q will have been
+ detached from parent_1 in the NEW mfest. but in the meantime we have
+ a problem.
- further restriction: foo/bar can only ever join foo/bar, and this can
- only happen in a merge in which one of the incoming edges is
- providing foo/bar already; it's actually written (unambiguously) in
- the joiner's changeset as 'join "foo/bar"' (inverse: 'split
- "foo/bar"'). so, scratch previous idea: adds and deletes are still
- expressed as being renamed to or from the nursery or graveyard, not
- split/joined with them.
+ so what we do instead is execute the first "half" of the renames in a
+ preliminary pass, then the second "half" of them in a second
+ pass. the change_consumer interface has a special
+ "finalize_renames()" method which triggers the second half; it is
+ called after the last rename is processed. here's how we *actually*
+ process the renames:
-
- further: suppose in A->C we have "join foo/bar", where there's a B->C
- edge containing a foo/bar to join with. all well and good. then
- suppose we append D="rename foo/bar foo/baz" to C. then the B->C->D
- edge has "rename foo/bar foo/baz", and the A->C->D edge has ...
- "join foo/baz"? that makes no sense.
+ - buffer all the renames into a vector of src/dst pairs, in order.
+ - when the user calls finalize_renames():
+ - build a temporary vector of idents with the same length as the buffer
+ - walk the vector from end to beginning, detaching each node from
+ the NEW tree and saving its ident in the temporary vector
+ - walk the vector from beginning to end, attaching each node to
+ the NEW tree, using the saved idents
- we know a join is essentially a "delete with a forwarding address";
- so like a delete it has to take effect *before* anything else in a
- cset. which means it is written in the pre-space of the cset. the
- problem is that the name it refers to is clearly a name in the
- post-space of the delta: the name it's colliding with!
-
- perhaps "join" is not the right word. perhaps we want something more
- like "doom foo/bar; rename foo/bar foo/baz" in the composite cset: an
- acknowledgment that the node named by foo/bar at the *beginning* of
- the cset is doomed, even if it gets moved around thereafter: the
- proviso of a doomed node eixsting in a revision is that there's a
- non-doomed node occupying the same node. a doomed node is not yet
- dead, but it adheres to some other node until that other node dies.
-
- subsequently if you doom the node foo/baz (which requires some other
- incoming node in a merge, using that name) then the composite cset
- is "doom foo/bar; doom foo/baz; rename foo/bar foo/baz" and you have
- *two* doomed nodes in the doomed set of the third node. this should
- continue to work until such time as the live node carrying the doomed
- ones actually dies.
-
- actually that reads terribly. it would be better to have "delete_foo"
- have a secondary clause, "with_heir", where the heir is a name in the
- post-space of the cset. this way the heir can get moved around; and
- the natural inverse is "add_foo" ... "with_sire", where the sire is
- in the pre-space of the cset. that feels more likely to work.
-
*/
#define __STDC_CONSTANT_MACROS
+#define __STDC_LIMIT_MACROS
#include
#include
#include