monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] What are rosters?


From: Nathaniel Smith
Subject: Re: [Monotone-devel] What are rosters?
Date: Fri, 17 Aug 2007 04:54:32 -0700
User-agent: Mutt/1.5.13 (2006-08-11)

On Fri, Aug 17, 2007 at 05:03:53PM +1000, William Uther wrote:
> Hi,
>   I'm trying to make DieDieDie merge go away.  I think I'm close,  
> but I need to understand the core monotone code better....
> 
>   So, er, exactly what are rosters?

A few different things -- mostly they are a data structure defined in
roster.{hh,cc} and used throughout the code.  There is also various
code in database.cc and roster.cc and roster_delta.cc to serialize
them to a format that can be stored on disk, and this involves delta
storage.  These are somewhat decoupled, though -- code that uses
rosters doesn't care how they are stored (they just use the high-level
"give me this roster", "store me this roster" methods), and the code
that stores rosters doesn't care about the details of their use, just
how to convert losslessly between in-memory data structures and
bytestreams that can go into the db.

Functionally, they are *the* data structure we use to describe the
state of a tree.  Basically they have one entry for each file and
directory in the tree, they have all the associated metadata for each
(names, any attrs, content hashes for files, etc.), they have stable
identifiers (node_ids) used to figure out which files/directories in
two different rosters correspond to each other, and they support
efficient access either by node_id or by path.

There is also the marking_map, which is a little confusing.  At the
level of data structures, marking_map and roster_t are separate, but
in the database they get bundled together -- so a database "roster"
corresponds to both an in-memory roster_t and an in-memory
marking_map.  The marking_map is tightly coupled to its corresponding
roster, and basically just contains for each scalar in that roster the
value of *(that scalar), in the *-merge sense.

As you say, *(A) is defined to be
erase_ancestors(marked_ancestors(A)), so there is sort of some
implicit pruning going on -- usually the mark set for a given scalar
is only a single revision, though it can be more (iff an accidental
clean merge occurred at some point in history).

>   So, is there a magic piece of documentation I'm missing?

Just the source code, which strives to be clear.  I have to tell you,
if you're planning to add resurrection support, then you're going to
end up needing to read and understand roster.cc and roster_merge.cc
pretty much in their entirety.  Hopefully the above few paragraphs
will help give you some clues how things fit together.

>   In particular, it seems that sometimes rosters are stored as  
> deltas, and sometimes complete.  Is there any rhyme or reason here?   
> Head is complete, others are reverse deltas?  Root is complete,  
> others are forward deltas?

At the moment IIRC leaves are stored in full while older rosters are
reverse deltas.  This is an implementation detail of the storage
system, though, and shouldn't really affect any of the rest of the
code.

>   My next question might be easier...  What's an easy way to get a  
> list of added or dropped files in a revision.  I'm thinking of a  
> function I could call from merge.cc : interactive_merge_and_store(),  
> and it would give me a list of node_ids.  Having added and dropped  
> files separately would be nice - maybe two functions?

Depends on what exactly you mean by "added or dropped", perhaps.
If you have two rosters, one old and one new, you can just look at the
node_id's contained in each, and everything that is in the new one but
not the old one is added, and everything that is in the old one but
not the new one is dropped.  make_cset does something like this to
generate the files_added/dirs_added/nodes_deleted fields in a cset.

Revisions are more complicated, because they can have more than one
parent.  You can ask which files were added or deleted versus each
parent (either from looking at the rosters, or looking at the csets
contained in the revision_t), or you might want to ask which nodes are
new in the child that existed in neither parent, and which nodes in
the child are gone that existed in both parents.  Or whatever.
Depends on what you're doing.

>   Finally, assuming I've worked out the merging issues (which  
> depends upon the answers above), I'd want to add a new command to  
> allow the resurrection of files.  I'm assuming the command would take  
> a file name and a revision and an optional new file name.  It would  
> then behave very similarly to the rename command... get the old  
> roster, clone the node, attach the new node to the new roster.  Does  
> that sound about right?

Plus workspace manipulation, presumably.  I suspect that by the time
you have a representation for resurrections that all the roster and
cset machinery can grok, you'll find that you have to do something
specific to that representation that is more complicated than what you
describe here, but dunno.  Hard to say until you get there.

-- Nathaniel

-- 
Damn the Solar System.  Bad light; planets too distant; pestered with
comets; feeble contrivance; could make a better one myself.
  -- Lord Jeffrey




reply via email to

[Prev in Thread] Current Thread [Next in Thread]