monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Re: A Two-Fold Proposal: On Formats And Front-Ends


From: Nathaniel Smith
Subject: Re: [Monotone-devel] Re: A Two-Fold Proposal: On Formats And Front-Ends
Date: Tue, 4 Oct 2005 16:45:42 -0700
User-agent: Mutt/1.5.9i

On Tue, Oct 04, 2005 at 02:02:14PM -0700, Graydon Hoare wrote:
> The framework for parsing basic_io is LL(1), predictive. As simple as
> possible. Read a token. The token must be a symbol. If the symbol is
> one you know, branch into the thing which reads the sequence of tokens
> (symbols, strings, or hex-blobs) for that symbol. Repeat. It's like

ObNag: we really should have a spec for basic_io in the manual...

> Revisions, MT/options, MT/work, and .mt-attrs use basic_io. Manifests
> and .mt-attrs are going away, replaced by rosters. Rosters use
> basic_io. I intend all future formats in monotone to use basic_io.

I'm not entirely convinced that rosters should use basic_io, because
1) they're completely internal, they say things like "no backwards
compatibility guaranteed here" in big letters, and 2) it may be
necessary for speed to switch to something more quickly parseable (cf.
manifests, where we ended up hand-coding a parser instead of using
boost regexps, because profiling showed it to be worth it).

There still is a "manifest" data format, and that will definitely be
basic_io.  That's what will continue to exist in the external
interface; when we talk about manifests "going away", we mean that
they won't be used internally anymore.

> basic_io was designed to be *relatively tolerable* for humans to read
> as well as machines. Hence the stanza-alignment, stanza-based line
> breaking, and use of quoting. This was to reduce the need for multiple
> output formats in the "status" and "commit" commands: we just print
> the internal, hashed representation to the screen. My intention is to
> change the "ls certs", "ls keys" and "log" outputs to be basic_io
> someday as well; probably when certs get their much-needed overhaul.
> Sooner if someone else helps :)

"automate certs" is basically "ls certs" but using basic_io.  We
definitely need a well-parseable way to get keys, just no-one got
around to it yet.  (I actually ran into this recently in the .dumb
branch, for now I hackishly parse "ls keys", but this is all wrong.)

It's entirely possible that "ls certs" should just do the same output
as "automate certs" does now.

"ls vars" is another thing that could well be switched to basic_io.
So is "trusted".

> There is a small, mostly-unused corner of monotone's i/o machinery
> called "packets", which were intended, long ago, to generate and
> consume non-whitespace-sensitive transport encodings, for example,
> when sending things through email. This was added back when whitespace
> actually played a role in some parsing operations. Since all of the
> packet objects in question are being shifted to basic_io anyways
> (which normalizes whitespace after parsing whitespace-insensitively),
> we should probably discard the packet format too. A lot of this is
> about available effort and time.

The use case for packets -- dumping data out of the db in an external
format that can lbe shuffled around as a blob of arbitrary bytes, and
then later re-imported -- is still important.  I don't particularly
care what the actual format used is.  The use case does make it
relatively unimportant that they actually be readable by anyway, so
the current stuff doesn't bother me that much.  A version that was
human-readable might well be nice for things like sending patches to
mailing lists, but for that you often want to also do things like roll
up multiple commits into a single "diff", so I'm not sure how great a
use case this is...

> Some commands, as you've noted, still stand out. I can identify three
> families of commands here:
> 
> 1. There are commands which produce simple lists of newline-delimited 
> filenames.
> 2. There are 'automate' commands which produce custom formats.
> 3. There is a --brief format.
> 
> I am fine with #1, in the sense that there is already special
> command-line treatment for filenames (all non-option, positional
> arguments are, or should be, treated as filenames), and the "unix
> tradition" of for example piping a list of filenames to xargs is worth
> supporting. I did not implement #2 or #3, and honestly I would not
> have done them the way they're done. But you know, it's not 100% under
> my control. Not even 10%. I think I was even on vacation when automate
> happened.

Hmm, I wasn't, so I should probably give some context :-).

> I can tell you what my preference is, though: I'd prefer if the
> automate commands all pumped out basic_io stanzas. I'd prefer if you
> could send basic_io stanzas to monotone as command sequences (say, for
> monotone stdio). I'd prefer if all commands could be invoked via
> stdio. And I'd prefer, rather than per-command things like --brief,
> that we do what marcel suggested (and what, if you look, the ROADMAP
> file has listed for some time), and give lua hooks a chance to control
> output formatting in general, so that if you *don't* want basic_io,
> there's something simple and general you can do about it.

The problem with basic_io in this context is that basic_io and the
unix shell utils hate each other.  Or maybe that's too strong, but
anyway they don't get invited to each others's parties.  One instance
of the "language binding" problem that people tend to forget, but that
is vitally important to a VCS, is bindings to that language known as
"sh".  Basically automate is the way it is, because I tried to
keep to whitespace separated lists, one per line, as the most simple
and universal format -- one that's trivial to deal with from the
shell, plays well with xargs (e.g., one can do quite complex things by
piping automate commands into each other via xargs), and is trivial to
handle from any other language as well.  (Most languages these days
can snarf in lines and split on whitespace in, like, 2 lines of
code...).

(Actually, the one exception to this is that "inventory" currently
uses a fixed width field followed by a number of space-separated
fields, but having actually used it a few times now I'm pretty
convinced this is a mistake.)

In a few places (maybe only "certs"), it turned out that our data had
more structure to it than could be easily encoded in this simple
meta-format, so we used basic_io.  (In particular, we had fields that
might contain spaces and newlines.)

I'd be interested in figuring out how to provide tools to manipulate
basic_io stuff from the shell, but I'm not sure what they'd look
like...

The trade-off "use line-oriented formats when possible, use basic_io
as the 'more complex' format when more complexity is necessary" does
still seem reasonable to me, but I'd be interested to hear other
views...

As for "stdio", I dunno, maybe there's some sane way to do it with
basic_io.  The basic problem that's being solved there is how to
encapsulate a bunch of discrete textual packets inside a larger
continuous stream protocol.  It might get a bit confusing to
read (having basic_io chunks containing more basic_io chunks), and
require us to rewrite the parser or something to be incremental (able
to deal with incomplete data by saying "give me more data and try
again", etc.).  Neither of these is really a killer, except I'd want
to make sure that they didn't mean we actually end up making front-end
writers's jobs _harder_ in the name of making them simpler...

As for --brief, lua hooks, and so on, customizability is all well and
good, and I would definitely support ways to customize output (though
I have no idea at all how to support it in the general case, previous
efforts (e.g., the formatting branch) have focused on the simple and
pretty consistent 'automate' output formats).  However, in the general
case this is a cop-out; we still have the responsibility to find the
best formats for people to see as defaults.  And it's not obvious that
the best format for human consumption will always be basic_io.

The point of the automate split is, of course, so we can make the
question "what format is best for humans?" and the question "what
format is best for computers?" into different questions...

-- Nathaniel

-- 
"...All of this suggests that if we wished to find a modern-day model
for British and American speech of the late eighteenth century, we could
probably do no better than Yosemite Sam."




reply via email to

[Prev in Thread] Current Thread [Next in Thread]