Re: base

emacs-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: base

From:	Stephen J. Turnbull
Subject:	Re: base
Date:	Thu, 26 Aug 2010 19:25:35 +0900
Eli Zaretskii writes:

 > That's not a "mental model", at least not by your definition

Of course it is, by *my* definition.  Of course I can't speak for
yours, though.  *sigh*  I guess I'm going to have to go into
interminable detail on this....

 > I don't see how users would need to know that stuff in order to be
 > able to use the tool safely and efficiently.

Really?  Don't you mean "I'd like to believe that users don't need to
know, because it's inconvenient and tedious"?

 > It's like saying that Emacs users need to know how Lisp data types
 > are implemented or what is the glyph matrix, in order to make good
 > _use_ of Emacs (as opposed to _extend__ it).

No, it isn't.  Your analogy is broken, because you are ignoring the
difference between private and public data.  When you use Emacs as an
editor, the public data is text.  When that text is structured, you
demand that other contributors understand the structure of the text,
whether it's C code, LISP code, TexInfo documentation, docstrings, or
even ChangeLogs and the NEWS file.  LISP data and the glyph matrix are
private data, and it's reasonable to ask Emacs to deal with them and
not bother you with those details when you're using Emacs.  You can
even ask Emacs to help with the syntax of text.  But eventually you
have to tell the users "You need to know TexInfo or whatever to work
on the manual, and you need to know when to use @code and when to use
@samp."  Emacs will help you produce correct syntax for @code, but it
can't tell you when to use it.

The history DAG and the commit metadata are public data.  They are
shared, they can be seen not only by the user who produces them but by
everybody in the project.  There are choices to be made, and they
don't have technical answers; they're matters of taste and policy.
You cannot expect the VCS to make correct decisions according to
project policy about when and where to branch, or which branch to
merge into which other, or when to rebase and when to merge at this
stage of the technology.

Project policy about what constitutes a "nice readable" history is a
matter of taste and a matter of software capability.  If users are
going to participate in those discussions, they need to understand the
capabilities of the software.  Bazaar is very feature-poor as far as
history restructuring goes; if you want a particular structure, you
need to follow a workflow that produces it.  Those workflows turn out
to be more complex for Emacs than the old workflows, and people
immediately complained.  That resulted in people posting alternative
workflows which work in *some* situations but not others, and yet
other people screwing up history by taking shortcuts with the
alternative workflows without having any idea what they were doing to
the DAG.

 > Sure, it's nice to know all that, but it isn't (and shouldn't) be
 > necessary for a user.  If you want to _extend_ the tool, then yes,
 > you'd need this and some more.

You may be right about "shouldn't", but you are wrong about "isn't",
at least given the capabilities of the chosen tool and at the level of
anybody who wants to participate in discussions about what the Emacs
workflow should be.

 > > I don't think any such thing exists for bzr.

 > http://doc.bazaar.canonical.com/bzr.2.2/developers/overview.html
 > and other docs in that area comes close, maybe.  But that stuff is
 > rightfully in the developers' department, IMO.

And in mine.  That is not even on the same planet with a mental model.
It's full of details that developers need to know to write the code to
work with Bazaar internals, but completely unnecessary to an
explanation of how the things the user can see work together with each
other and with the users.  OTOH, it is no help in understanding
externally visible behavior, whether intentional or buggy.

By contrast, the object database model as presented by the Git
Community Book is abstract.  In fact if you look into a real git ODB,
you'll just see files full of apparently random bits -- they're all
delta-compressed and gzipped.  Such details are not part of the mental
model, and they're not in the book (not emphasized in that chapter, at
least).

And that model can be visualized directly in git by gitk.  If you can
see it there, it's in your repository.  You can look at directory
objects or at files (the GUI version of the use of cat-file in
gittutorial-2), or at diffs.  This is not necessarily true in bzr,
which supports "ghost revisions" -- don't ask, I don't know the
details.  But that has to complicate the mental model, and it
definitely complicates and weakens the implementation: "ghost
revisions" are implicated in several of the cases of wedged branches
I've seen reported on address@hidden

You can logically traverse the graph you see in gitk using reset or
checkout, and it will have a visible effect on the display in gitk
(after doing refresh or maybe reload).

Modern versions of gitk will also display a representation of the
index to you (which you can also get from the status command).  The
bzr model behind status is much fuzzier (although this probably
doesn't matter as much as the object model).

So having a model means that it is easy to understand what an git
operation will do in terms of the graph you see in gitk.  "commit"
makes a node and links it on to the head of the active branch,
"branch" and "tag" produce labels, etc.  Fetch adds hidden nodes and
arcs to the graph, pull or merge makes them visible by attaching them
to the visible part of the graph and committing on top, then advancing
the branch head to the new commit (thus making the new nodes
"reachable from HEAD" in graph-theoretic terms, and gitk will adjust
the display by making these newly reachable nodes visible upon
refresh/reload).  In bzr, however, it's not so obvious how to think
about this.  The commit is not automatic, so "merge" seems to be about
content rather than the DAG.

And there's one really big difference.  In the mental model I use, git
objects are *eternal* and *universal*.  They've always existed, they
always will, and they're the same for everyone.  This is because of
the (model) 1-1 map between SHA1 hash values and all the possible
objects that could in theory exist.  (In reality, a hash isn't good
enough, for "eternity" and "all" you really want something like Gödel
numbers but with better compression properties -- a typical difference
between "mental model" and "implementation".)  In the actual git
implementation, once you've seen an object's content, it's "eternal
enough".  That is, a non-garbage object is never deleted, and even
garbage is kept around for 60 days by default.  (Hey, that's even
better than Scheme's promise of eternity!  And if that worries you,
you can set garbage collection to "never".)  Once you realize that,
recovery from rebase madness or inadvertant deletion of a branch[1]
becomes a certainty -- you just need to find the command that does the
tracing part of garbage collection, which will list up the dangling
heads and other unreachables for you.  This may not mean much to you,
but I think of git as very reliable because I know what it promises
about my data, and I'm quite confident that Linus's original design
was straightforward and good ("just how hard can it be to design a
linked list?" he asks the Emacs developers), and that improvements
since then didn't require a genius to work on the ODB, just ordinarily
smart hackers.

This *simply isn't true* of bzr.  Eternity is a little shaky.  bzr
uncommit will (eventually) destroy the commit, and nothing in the bzr
docs or that I've read on address@hidden gives me a model of *when*,
except that I sorta think that it won't be copied when you branch from
that branch.  Practically, the bzrtools plugin does have a "heads"
command which will allow you to find such objects, so probably this is
no worse than git.  But not having a model worries me.  Worse, bzr
objects are definitely *not* universal from the external viewpoint
(eg, two identical programs can be different objects in bazaar).  bzr
tracks "containers" (delete all the content and metadata like name
from a file, and you have a "container").  This is what allows it to
do the much-vaunted "provably correct tracking of copies and renames".
Containers have identifiers (those arch-ids you see in Emacs files are
actually container ids), and those ids depend on who created them,
where, and when.  Two containers can be exactly the same as files, but
they'll be different anyway.  I don't have a model of this.  That is,
I don't understand why this is the provably correct way to treat
renames and copies.

And who knows how the history DAG is represented in Bazaar?  I don't,
and I suspect it changes from branch format to branch format.  Why do
the "rich root" formats need to be incompatible with the non-rich-root
formats, and what are they good for anyway?  You got me on both
counts.  This kind of thing just goes on and on and on and on and on
and on and on and on and on and on and on and on and on and on and ....

Note that I'm not saying that Bazaar is inconsistent or incoherent.
I'm saying that I personally have trouble giving an account of why
it's consistent and coherent.  I would not want to try to debug *any*
failure in bzr without the help of the developers, because I have no
idea where I'd start.  OTOH, I can give an account of why git does
what it does, and I've had success in teaching that model to others.
Although the first thing I'd do in case of a bug is report it, the
second thing I'd do is to start browsing code.  I think I'd have a
good chance of localizing the issue, because I have a model of how git
is supposed to work.  YMMV.

Footnotes: 
[1]  Branches are *not* objects, they are *refs* = references to
objects, and so can be created or deleted at will without breaking any
promises about object permanence.
[Prev in Thread]
Current Thread
[Next in Thread]
Re: base, (continued)
Prev by Date: Re: Inclusion of dbus-proxy
Next by Date: Re: base
Previous by thread: Re: base
Next by thread: Re: base
Index(es):
- Date
- Thread