qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] block: format vs. protocol, and how they stack


From: Markus Armbruster
Subject: [Qemu-devel] block: format vs. protocol, and how they stack
Date: Fri, 18 Jun 2010 14:59:37 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux)

The code is pretty confused about format vs. protocol, and so are we.
Let's try to figure them out.

>From cruising altitude, all this format, protocol, stacking business
doesn't matter.  We provide a bunch of arguments, and get an image.

If you look more closely, providing that image involves sub-tasks.  One
is to haul bits.  Another one is to translate between bits in different
formats.

Working hypothesis:

* A protocol hauls image bits.  Examples: file, host_device, nbd.

* A format translates image formats.  Examples: raw, qcow2.

Note: this does *not* follow the code's use of the terms.  It better
doesn't.  Because the code is confused.

Both protocol and format provide an image.  That's why we can and in
fact do have a common abstraction for them: BlockDriver.  Our data type
for a block driver instance is BlockDriverState.

Nothing stops a block driver to translate and haul at the same time.  We
generally separate the two jobs, because it lets us combine the
different ways to translate with the different ways to haul.
Nevertheless, a block driver *can* be both format and protocol.

Example: vvfat arguably both translates and hauls.

Let's call a format that isn't also protocol a pure format, and a
protocol that isn't also a format a pure protocol.

Obviously, pure formats need to sit on top of something providing images
to translate.  Formats don't care whether those somethings translate or
haul.  Therefore, a pure format is always stacked on one or more
BlockDriverStates.

Example: raw is always stacked one exactly one BlockDriverState (stored
in bs->file).

Example: qcow2 is always stacked on exactly two BlockDriverStates
(stored in bs->file and bs->backing_hd).

Conversely, anything that isn't stacked on any BlockDriverState can't be
a pure format, and thus must be a protocol.

Example: file hauls an ordinary file's bits, nbd hauls bits over TCP
using the NBD protocol.

Summary so far:

1. BlockDriverStates form a tree.

2. The leaves of the tree are protocols, not pure formats.

3. The non-leaf nodes may be anything.  We haven't found a reason why
   not.


In general, a block driver needs some arguments to create an instance.
The current code provides two BlockDriver methods for that:

* bdrv_open() takes a flags argument.

* bdrv_file_open() takes a flags argument and a filename argument.

This is woefully inadequate for anything but the simplest block drivers.
Any driver taking more complex arguments has to extract them out of the
"filename".

Example: http extracts url and optional readahead.

A saner interface would pass flags and a suitable argument dictionary
such as QemuOpts.


Now, let's review our existing interface to create such a tree of block
drivers.  Beware, royal mess ahead.

There are two interfaces.  The first one is bdrv_open().  It takes three
arguments: filename, flags and an optional block driver argument.

If flag BDRV_O_SNAPSHOT is set, we do snapshot magic.  Omitted here in
an attempt to protect reader sanity.

If the block driver is missing, we guess one.  More on that below.

The block driver is instantiated to set up the root of the tree.  Let's
call it the root block driver, and its instance bs.

If the root block driver provides method bdrv_file_open(), it is used,
and gets the flags and filename argument.

Else, we first instantiate *another* driver.

    We use the second interface for that: bdrv_file_open().  It takes
    filename and flags arguments like bdrv_open(), but no block driver
    argument.

    It chooses the block driver by looking at filename.  If filename
    names a host device, use the protocol for hauling that device's
    bits.  If it starts with P:, where P is some driver's
    "protocol_name", use that driver.  Else fail.  Except I just lied;
    the actual rules are messier than that.

    Unlike bdrv_open(), bdrv_file_open() ignores flag BDRV_O_SNAPSHOT,
    and always behaves as if flag BDRV_O_NO_BACKING was set.

We store the instance in bs->file.

Then the root block driver is instantiated with method bdrv_open().  It
gets the flags argument.  It stacks on top of bs->file, but that's mere
convention.

Note: one of the code's ideas on format vs. protocol is "protocols
provide bdrv_file_open(), formats do not".  I don't think that idea is
helpful.

The root block driver may ask for a backing file.  To do that, it sets
bs->backing_filename and optionally bs->backing_format, both strings.

Example: qcow2 reads the two strings from the image header.

We instantiate the backing file bs->backing_hd with bdrv_open().
Recursion.  Arguments: bs->backing_filename, flags derived from our own
flags argument, and the driver named by bs->backing_format.  If
bs->backing_format is unset, pick one just like -drive does when its
format option is unset.

The root block driver stacks on top of bs->backing_hd, by convention.

Flag BDRV_O_NO_BACKING supresses backing file setup, but let's ignore
that here.

This provides for common stacking, but it's not general.  Block drivers
can and do instantiate other block drivers on their own, for their
stacking needs.

Example: blkdebug instantiates bs->file with bdrv_file_open().  It
passes on its flags argument and the part of its filename argument it
doesn't use itself.


How could a saner interface look like?

An obvious interface for building trees lets you build bottom up: tree
node constructor takes children and whatever other arguments it needs.

COW backing files complicate matters.  We need to open the COW to find
its backing file information.  I'd build a tree without the backing file
normally, read the backing file information, create the tree for the
backing file, and attach it to the COW node.


Next, let's review the encoding of the filename argument.  It is decoded
in the block driver bdrv_file_open() methods.  Every block driver has
its own ad hoc encoding.

Example: file interprets it as a filename.

Example: nbd parses "nbd:" [ "unix:" filename | host ":" port ]

Additionally, bdrv_file_open() recognizes P: (see above).  This breaks
when the block driver's encoding is incompatible with that.

Examples:

    bdrv_open() arguments           behavior
    filename        block driver
    scruffy:duck    none            fails: no driver named "scruffy"
    scruffy:duck    bdrv_raw        fails: no driver named "scruffy"
    scruffy:duck    bdrv_file       bdrv_file uses file "scruffy:duck"
    fat:duck        none            bdrv_raw stacks onto
                                    bdrv_vvfat uses directory "duck"
    fat:duck        bdrv_raw        bdrv_raw stacks onto
                                    bdrv_vvfat uses directory "duck"
    fat:duck        bdrv_file       bdrv_file uses file "fat:duck"

Bizarre, isn't it?

More examples: try to use a qcow2 image named "fat:duck"

    bdrv_open() arguments           behavior
    filename        block driver
    fat:duck        qcow2           bdrv_qcow2 stacks onto
                                    bdrv_vvfat uses directory "duck"
                                    fails: vvfat2 doesn't provide a
                                    qcow2 image
    file:fat:duck   qcow2           bdrv_qcow2 stacks onto
                                    bdrv_file uses file "file:fat:duck"

Close, but no cigar.


-drive & friends expose this mess in the user interface as follows:

* They use bdrv_open().

* Option format selects its block driver argument.  It need not be a
  format.  Any block driver does.  Pearls like "format=file" confuse
  users (What format is "file"?  And what's the difference to "raw"?).
  Note that you need format=file if you have colons in your filenames.

* Option file is the filename argument.  It's not really a filename, but
  an encoding of block driver name and arguments.

  If the block driver selected by format makes bdrv_open() instantiate a
  second block driver (because it wants to stack on it), then this
  argument also selects that block driver.  But you can only select
  block drivers that support the funny colon syntax.

* Options snapshot, cache, aio, readonly are combined into the flags
  argument.

We need to think about a saner user interface, but I figure this message
is already plenty long, so I stop here.



[*] It must provide bdrv_file_open(), or else death by infinite
recursion (I think).



reply via email to

[Prev in Thread] Current Thread [Next in Thread]