Re: [Gnu-arch-users] Re: situations where cached revisions are not so go

gnu-arch-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Re: situations where cached revisions are not so go

From:	Jason McCarty
Subject:	Re: [Gnu-arch-users] Re: situations where cached revisions are not so good
Date:	Sun, 28 Sep 2003 01:56:41 -0400
User-agent:	Mutt/1.5.4i

Tom Lord wrote:
>     >   I don't make use of cachedrevs, but they could be
>     >   integrated if desired (for example if a revision is really huge, a
>     >   cachedrev might be better than summaries).
>     > 
>     >   Implementation of arch_build_revision(ALMOST, TARGET, DESTDIR):
>     >   1) If ALMOST is nil:
>     >      2) Try to find a pristine or libraryrev ALMOST which is the
>     >         greatest lower bound of TARGET, with the same version as TARGET.
>     >      Else:
>     >      3) Set ALMOST to TARGET's base-0.
> 
> I sort of follow you here.   `versions' (in the arch sense of logical
> development lines) don't come into play directly.   We have to talk
> about `ancestry'.

As a useful first approximation, my algorithm assumes that the entire
ancestry resides within one version. If this assumption is incorrect, it
will be changed in step 8, when a continuation from another version is
reached. I believe this will follow the ancestry correctly.

I should have explained my least upper bound (lub) and greatest lower
bound (glb) operations more completely, as I think much of your
criticism stems from not catching my meaning. Because of the assumption
that the whole ancestry is in the same version, the lub/glb calculations
become almost trivial; they are just making comparisons between integers
(although comparing the versionfix versions needs an adjustment).

So if GOAL is a--b--1.0--patch-39 and we have pristines for
a--b--1.0--patch-{13,17,21,52}, then glb(GOAL) = a--b--1.0--patch-21,
since 21 <= 39 and 21 >= 13 and 17. So a--b--1.0--patch-21 would be
ALMOST (even though it may not be correct, it will be fixed later).

Similarly, with the same ALMOST and GOAL, if there are summaries from
a--b--1.0--patch-{12,23,27} to GOAL (they would be in GOAL's directory)
then lub(ALMOST) = a--b--1.0--patch-23, since 23 >= 21 and 22 <= 27. So
DELTA would be summary-23-39, and ORIG would be a--b--1.0--patch-23.
Even if there is a continuation from another version between patch-23
and patch-39, it doesn't matter, because the summary is an exact delta
between them, and we have patch-23 in full. Granted, the summary may be
bloated if the continuation caused much change from the previous
revisions.

> We don't know, a priori, the entire ancestry of GOAL.   To find it
> out, we have to walk it backwards, querying an archive each step of
> the way.
> 
> So, we can't pick the closest ancester ALMOST without some server
> round-trips here -- but that's OK, because we'll need to make those
> very same round-trips no matter what.

Which is why I make an initial assumption and correct it as necessary
while walking the tree backwards.

> There is a slight problem with your approach.   We can't count on
> access to the oldest ancestor.  It may very well be in some archive
> that we don't have access to.   We pretty much _have_ to be willing to
> set ALMOST to an archive-cached revision.
> 
> Another problem is that the oldest ancestor may be very far away
> compared to the nearest archive-cached ancestor -- another reason we
> can not ignore archive cached revisions.

That's possible. I treated cachedrevs as an afterthought when I should
have implemented them fully. In step 6, if a cachedrev for GOAL existed
you could retrieve it and return. I think it might be advantageous to
have a flag in ~/.arch-params/=locations for each archive that stated
whether to utilize cachedrevs in arch_build_revision for that archive.

> (Oh, and, just as a point of amusing trivia:  unpacking a _locally_
> archive-cached revision is notably faster than copying that revision
> from a revision library.)

I didn't know that. The current algorithm also uses revlibs in
preference to cachedrevs, doesn't it?

>     >   4) If ALMOST == TARGET, copy ALMOST to DESTDIR and return (may have to
>     >      call arch_build_revision if it's a continuation, or connect to
>     >      archive to grab import).
> 
> Sure, if ALMOST is the GOAL and we've managed to find a copy of ALMOST
> -- we're done.
> 
> 
>     >   5) Connect to TARGET's archive.
>     >   6) Find a DELTA to TARGET whose FROM endpoint is the least upper bound
>     >      of ALMOST (this will be either a revision or a summary delta in
>     >      TARGET's directory). Continuations count also; let FROM be the
>     >      revision it's a continuation of. Use a continuation from another
>     >      version if it's all that's there.
>     >   7) If DELTA is a continuation from a different version than TARGET:
>     >      8) Call arch_build_revision(0, FROM, DESTDIR).
>     >      Else call arch_build_revision(ALMOST, FROM, DESTDIR)
>     >   9) Retrieve and apply DELTA to DESTDIR.
> 
> Ok, assuming that I follow you:   the oversimplification in what
> you're saying is that you can not ignore archive cached revisions.
> You can't count on searching way far back in ancestry.   Searching
> back in ancestry requires one round-trip per step.   If the summary
> delta you find is too far back relative to the next nearest cachedrev,
> this is a pessimizing algorithm.   etc.

Part of the point of summaries (at least in my implementation) is that
you don't have to search the ancestry revision-by-revision. You skip
large parts of it, which _should_ actually result in fewer round-trips.
Knowledge of the complete ancestry is discovered as we go, and not all
of the ancestry need be examined.

So it's not any more pessimizing than cachedrevs currently are.

> It's a graph searching problem.   We have the ancestry graph which is
> just a linear, directed graph.   Now you're adding these summary-delta
> links to the ancestry graph so we have things like:

I utilize the ALMOST variable to force which path will be followed, so
we effectively still have a linear graph.

>    GOAL --> GOAL-ancestor --> GOAL-ancestor^2 .... -> GOAL-ancestor^N ...
>                \                               ^
>                 \                             /
>                  `---------------------------'
> 
> Each node on the graph is a fully qualified revision name.   For a
> given revision name, we can cheaply ask: "do I have a pristine?" and
> "do I have a rev lib entry?"
> 
> For the cost of a server roundtrip, we can ask: "is it archive
> cached?"  and "is there a summary delta here" and "what is the name of
> the next ancestor?"
> 
> Initially, we know only "GOAL".
> 
> We can build GOAL by walking backwards along this graph until we find
> a sufficient collection of bits and pieces to assemble GOAL.   With
> summary-deltas, and a more complicated build_rev algorithm, now we
> have multiple paths we might want to explore.

More complicated algorithms could certainly put forth lots more effort
to follow the graph in the most efficient way, yes. I've chosen a simple
algorithm to avoid doing complicated graph searches.

> One difficulty of your algorithm, if I'm following it correctly, is
> these "least upper bound" computations -- they assume more knowledge
> of the ancestry graph than we have on hand.

I think I've shown above that they are trivial to calculate, and that
they eventually get corrected in step 8 if they're wrong.

> Your algorithm isn't _wrong_, by a long stretch -- it's just missing
> consideration of archive-cached revisions and missing a bunch of
> important details.  (So, keep going?   Maybe this is a good time at
> which to start playing with the code.)

I should give cachedrevs more consideration, although I don't think
there's a huge amount of detail to add than what I wrote in this
message. I _wish_ I had time to try implementing it, but I've spent far
too much time thinking about it already. Perhaps next weekend I'll be
able to attempt an implementation.

>     > As you can see from my algorithm, the amount of extra code is not that
>     > large. 
> 
> I don't see that yet.   Might be true -- but I don't see it yet.

That was hand waving on my part, sorry. We'll both find out if/when I
write some code.

Jason

[Prev in Thread]

Current Thread

[Next in Thread]

[Gnu-arch-users] Re: situations where cached revisions are not so good, (continued)

Prev by Date: Re: [Gnu-arch-users] configs - extracting from arch
Next by Date: Re: [Gnu-arch-users] Re: give us a hand with arch
Previous by thread: Re: [Gnu-arch-users] Re: situations where cached revisions are not so good
Next by thread: Re: [Gnu-arch-users] Re: situations where cached revisions are not so good
Index(es):
- Date
- Thread