[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gnu-arch-users] [BUG] FEATURE PLAN: two stage commit
From: |
Tom Lord |
Subject: |
Re: [Gnu-arch-users] [BUG] FEATURE PLAN: two stage commit |
Date: |
Sun, 11 Jul 2004 22:35:23 -0700 (PDT) |
> From: Robert Collins <address@hidden>
> > * Extend the "dumb server" archive protocol
> > Clients should be able to "almost commit" a revision by
> > renaming it to something like:
> > patch-<N>--<secret>--<other-revision>--<other-archive>
> > instead of just "patch-<N>".
> > Such a "half-committed" revision should act just like a=20
> > held lock.
> > The lock-breaking algorithm has to be modified. Asked to
> > break a half-committed revision, it _must_ ensure (creating
> > if necessary) the existence of the revision:
> Why would it create this other revision? Wouldn't the absence of the
> other revision be a sign to rollback?
? No.
A good c.s. database theory book can give a more general framework but
I'll put it in informal operational terms that are arch specific:
Let's say that we want to atomically commit, in a bunch of different
revisions, the revisions A, B, C, D, .....
We want all of those commits to occur or none of them: nothing in
between.
So we pick one, let's say A, to be the "transaction leader". In the
two-stage commit algorithm, we're going to ensure that if A commits,
then all of B, C, D, etc. are guaranteed to commit. If A does not
commit, then none of B, C, D, etc. will commit.
We do this in two stages (or, I guess the usual phrase is "two
phases"):
First, we "half commit" or "conditionally commit" B, C, D, etc. They
are half-committed meaning that a reader will not recognize them as
being committed unless that reader is certain that A has, in fact,
been committed.
Second, we fullly commit A. If that succeeds, B, C, D, etc. are now
impolicitly converted to "full commits".
After those two phases we can clean up a bit. We can note that B, C,
D, ... are no longer "half committed" or "conditionally committed".
We can mark them as fully committed so that reader don't have to
verify that A has been committed. But this is just paperwork... it's
the two phases described above that really matter.
Note that the clean-up step is just an optimization. A reader who
encounters a half-committed B can't decide if that's a completed
commit or not without observing whether or not the corresponding
commit of A is complete.
While B is "half committed" but before A is committed, B functions
like a lock. Nobody else should commit ahead of B. The commit of B
has to be canceled or completed before someone else can commit to that
same version. So between phase 1 and phase 2, the "half commit" of B
is basically an ordinary version lock.
Let's say we want to break that lock. We can't do that just locally
to B. Some other client may be convinced that B is half committed
and, on the basis of that assumption, commit A. If A is committed
but the B commit broken, then we've had an atomic commit of only A, C,
D, E, .... leaving out B: a bug.
In terms of traces, a client who wants to atomically commit A, B, C,
... will do:
half-commit B
half-commit C
half-commit D
half-commit E
....
commit A [*]
At the step marked [*], B, C, D, ... all become (instantly and
implicitly) complete commits.
Now, let's add a second client who's goal is to break the lock on B.
The parallel traces are:
client 1: client 2:
half-commit B
half-commit C
half-commit D un-commit (break lock) B
half-commit E
....
commit A? [*]
The successful lock break by client 2 implies that the [*] of client 1
must fail.
How can we make that fail? Client 1 can't just "double check" that
B, C, D, etc. are half committed. This won't work:
DOES NOT WORK:
client 1:
half-commit B
half-commit C
half-commit D
half-commit E
....
double-check B
double-check C
double-check D
...
commit A? [*]
because, for example:
DOES NOT WORK:
client 1: client 2:
half-commit B
half-commit C
half-commit D
half-commit E
....
double-check B
double-check C un-commit (break lock) B
double-check D
...
commit A? [*]
Oops. Even after we double-checked B, somebody else broke the lock
for B.
The only way that client 2 can prevent client 1 from committing A is
to commit something else in place of A:
DOES WORK:
client 1: client 2:
half-commit B
half-commit C
half-commit D commit A', usurping A
half-commit E
....
commit A?: no, fails,
because A' was committed
To break the lock on B, client 2 has to commit a revision to A which
says, in effect, "the composite transaction for which this is the lead
revision failed; any half-committed revisions waiting for A are, in
fact, broken locks".
> On a related note, I am planning on implementing two-phase commits with
> non-tla resources.
In english please? "non-tla resources?"
> Would this fit well within your architecture above?
No idea since the question is hopelessly vague.
> My current thoughts are just to modify commit & lock-revision to be able
> to leave a all-but-committed changeset in place, and to
> renamed-to-fully-committed from the command line.
That approach, if I understand what you mean, will fail to have ACID
properties and will therefore be a
_MAJOR_HUGE_VERY_VERY_BIG_AND_SERIOUS_ regression.
Ok, the word "regression" is slightly misleading since it wouldn't
break existing functionality. But it would add new functionality
that is incredibly broken in a way that existing functionality
carefully avoids.
-t