qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [Qemu-devel] [PATCH 03/17] spec: add qcow2-dirty-bitmap


From: Denis V. Lunev
Subject: Re: [Qemu-block] [Qemu-devel] [PATCH 03/17] spec: add qcow2-dirty-bitmaps specification
Date: Thu, 8 Oct 2015 23:56:42 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0

On 10/08/2015 11:28 PM, John Snow wrote:

On 10/07/2015 03:05 PM, Denis V. Lunev wrote:
On 10/07/2015 07:47 PM, Max Reitz wrote:
On 05.09.2015 18:43, Vladimir Sementsov-Ogievskiy wrote:
Persistent dirty bitmaps will be saved into qcow2 files. It may be used
as 'internal' bitmaps (for qcow2 drives) or as 'external' bitmaps for
other drives (there may be qcow2 file with zero disk size but with
several dirty bitmaps for other drives).

Signed-off-by: Vladimir Sementsov-Ogievskiy <address@hidden>
---
   docs/specs/qcow2.txt | 127
++++++++++++++++++++++++++++++++++++++++++++++++++-
   1 file changed, 126 insertions(+), 1 deletion(-)
Overall: I'm strongly against putting dirty bitmaps into qcow2 files, at
least not as it is envisioned by this series.


If you don't feel like reading why, and you'd rather read what I'd do if
you really, really want to put them into qcow2, files, skip ahead until
the "RANT OVER" line.


The first indication of why that is the case is that this patch does not
add any explanation to the qcow2 specification what these dirty bitmaps
are. Therefore, there are basically just binary data that is given a
name and dumped into a qcow2 file as if it were a tar file.

One could argue that this is qemu and we know what dirty bitmaps are.
But qcow2.txt is located in docs/specs/, not just in docs/. It is not an
explanation, but a *specification*, and as such it should 
explainhttp://www.preining.info/blog/2015/10/looking-at-the-facts-sarah-sharps-crusade/
everything related to qcow2.

As a side notice, we already have a binary data dump in qcow2 files, and
that is the VM state. This is bad enough and if it would have been up to
me, it would have never been there. That's because it's something only
qemu can make use of, and not even different versions of qemu are
compatible there, so it was (in my opinion) a pretty bad idea to put it
into qcow2.

So what this specification is definitely lacking is an explanation on
how any independent program (i.e. *not qemu*) is to interpret the dirty
bitmaps. I do believe this is possible, as opposed to the VM state. The
VM state, nobody can do anything with it, it's even difficult for qemu
itself sometimes.

So let's imagine this specification would contain an explanation on what
dirty bitmaps are and what they mean. Actually, now that I think about
it, I cannot really imagine it, because I'm lacking that explanation.
What do they mean? As far as I can see from the series, they actually
don't mean anything. It's just a dump of data into a qcow2 file, and it
can be any bitmap, be it associated with the file itself or not.

This is further pointed to by your feature proposal "Allow qcow2 images
without l1_table and other staff but only with dirty bitmaps with
minimum overhead". There is a file format for exactly that, and it's
called tar (yes, you are missing some metadata, but just add a JSON
description file to the archive and you're done).

By the way: I heard John briefly touch this in his talk at KVM Forum
when he explained that this would make qcow2 files something like better
tar files, and I didn't like the idea back then either. I was hoping
that it would actually be differently, and was waiting for some
discussion to appear, but I didn't notice this series, because it
doesn't have "qcow2" in the cover letter's subject (and I wasn't CC'd,
but I don't really see why I should have been, as I'm not mentioned in
the MAINTAINERS file (what a lucky man I am!)). I only just noticed
today when I saw a lone reply from John on qemu-block to a patch with a
"qcow2:" prefix.

So, what you are apparently planning to do is to dump dirty bitmaps into
any available qcow2 file. If the image you are operating on is a qcow2
file, great! If it isn't, you create some empty qcow2 file and dump the
bitmaps there.

Then, I'm asking myself why you don't use tar files in the second case,
and then, why you don't use tar files in the first case. I do remember
John saying that there was a dicussion about it, but I don't know about
it, so I don't know why you dropped that idea in favor of making qcow2
files tar archives. The only reason I can think of off the top of my
head is that we have infrastructure for reading qcow2 files, but not for
tar files. However, this series is like just appending a tar file to a
qcow2 file, and then implementing a reader for tar archives inside of
the qcow2 driver, so it doesn't seem to be much simpler in practice.

In any case, if my assumptions so far are more or less correct, no
outside program can do anything with the dirty bitmaps contained in the
qcow2 file, because they are just binary data which does not necessarily
have any connection to the qcow2 file itself. Not even qemu can make
sense of them, it appears, it needs the user or the management tool to
do so.

I am strongly against putting binary data into a qcow2 file which does
not have any visible connection to the file's contents.

Obviously, it is possible that there is some connection which I am just
not seeing, though.



--- RANT OVER ---

Okay, that was enough destructive criticism, now to get some
constructive arguments and ideas.

So, there are two points I don't like: First, it's binary data which
isn't explained in the qcow2 specification. This can easily be fixed.

Second, there is no obvious connection between the qcow2 file and a
dirty bitmap. I'd drop the idea of "If you use anything else than qcow2,
we create an empty qcow2 file and put the dirty bitmaps there". Please
don't do that. If you are using something else and want this feature,
that's your problem. If you need features, you use qcow2. That's it. If
you really want to support it for other file formats, but the data into
tar archives and not into qcow2 files.

For comparison, this is like using a qcow2 file for implementing backing
files for raw images. The cluster offsets in L2 tables would then point
to offsets in the raw image (and the host offset would have to match the
guest offset), and by looking at which L2 table entries are unused, one
could deduce which sectors are to be read from the backing file. We
don't support that either, because you should just use qcow2 if you want
backing files.

Next we need to know for every dirty bitmap what the reference disk is.
Since generally that reference disk is stored in some image file
somewhere, I'd add a filename for each of the dirty bitmaps which is the
base file in respect to which these clusters are considered dirty.

As a measurement on how well you have done to associate a dirty bitmap
with a qcow2 file, imagine the following scenario: You are writing a
program independent of qemu, and that program is to make use of the
dirty bitmaps for incremental backups.

With my proposal above, it would open the qcow2 file and pick some
bitmap based on name, base image, user choice or maybe some property of
the bitmap itself (e.g. lowest dirty bit count). Then, it would create a
new overlay file (the backup image), let's say a qcow2 file, and use the
base image filename of the selected dirty bitmap as the filename of the
backing file for the backup image. Then, it would copy all dirty
clusters from the original qcow2 file to the backup image, and that's it.

Right now, with this patch alone, the tool has no idea what the base
image is, and some bitmaps may not even be related to the very qcow2
file they are in at all.


With that fixed, I could be moved to accept the concept of dirty bitmaps
in qcow2 files grudgingly. Maybe happily, if you give me a good reason
why we should not put them into tar files.


And I have some other comments in regards to the specification:

diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
index 121dfc8..5fc0365 100644
--- a/docs/specs/qcow2.txt
+++ b/docs/specs/qcow2.txt
@@ -103,7 +103,13 @@ in the description of a field.
                       write to an image with unknown auto-clear
features if it
                       clears the respective bits from this field first.
   -                    Bits 0-63:  Reserved (set to 0)
+                    Bit 0:      Dirty bitmaps bit. If this bit is
set then
+                                there is a _consistent_ Dirty
bitmaps extension
+                                in the image. If it is not set, but
there is a
+                                Dirty bitmaps extension, its data
should be
+                                considered as inconsistent.
+
+                    Bits 1-63:  Reserved (set to 0)
              96 -  99:  refcount_order
                       Describes the width of a reference count block
entry (width
@@ -123,6 +129,7 @@ be stored. Each extension has a structure like
the following:
                           0x00000000 - End of the header extension area
                           0xE2792ACA - Backing file format name
                           0x6803f857 - Feature name table
+                        0x23852875 - Dirty bitmaps
                           other      - Unknown header extension, can
be safely
                                        ignored
   @@ -166,6 +173,24 @@ the header extension data. Each entry look
like this:
                       terminated if it has full length)
     +== Dirty bitmaps ==
+
+Dirty bitmaps is an optional header extension. It provides an
ability to store
+dirty bitmaps in a qcow2 image. The fields are:
+
+          0 -  3:  nb_dirty_bitmaps
+                   The number of dirty bitmaps contained in the
image. Valid
+                   values: 0 - 65535.
Why? Because that's what qemu supports? That's not a real reason. If so,
you may make a note of that (see the cluster_bits documentation), or
just omit it; for years, qemu only supported refcount_order = 4, but the
specification did not make a note of that. It was just a limitation of
qemu, but not of the format.

+
+          4 -  7:  dirty_bitmap_directory_size
+                   Size of the Dirty Bitmap Directory in bytes.
Valid values:
+                   0 - 67108864 (= 1024 * nb_dirty_bitmaps).
Same here.

+
+          8 - 15:  dirty_bitmap_directory_offset
+                   Offset into the image file at which the Dirty Bitmap
+                   Directory starts. Must be aligned to a cluster
boundary.
+
+
   == Host cluster management ==
     qcow2 manages the allocation of host clusters by maintaining a
reference count
@@ -360,3 +385,103 @@ Snapshot table entry:
             variable:   Padding to round up the snapshot table entry
size to the
                       next multiple of 8.
+
+
+== Dirty bitmaps ==
+
+The feature supports storing dirty bitmaps in a qcow2 image.
I think I've made my point clear enough in the huge wall of text above,
but I'll just repeat it once more: This should explain what dirty
bitmaps are and how they are to be interpreted.

+
+=== Cluster mapping ===
+
+Dirty bitmaps are stored using a ONE-level structure for the mapping of
+bitmaps to host clusters. It is called Dirty Bitmap Table.
+
+The Dirty Bitmap Table has a variable size (stored in the Dirty Bitmap
+Directory Entry) and may use multiple clusters, however it must be
contiguous
+in the image file.
+
+Given an offset (in bytes) into the bitmap, the offset into the
image file can
+be obtained as follows:
+
+    byte_offset =
+        dirty_bitmap_table[offset / cluster_size] + (offset %
cluster_size)
+
+Taking into accout the granularity of the bitmap, an offset in bits
into the
+image file can be obtained like this:
+
+    bit_offset =
+        byte_offset(bit_nr / granularity / 8) * 8 + (bit_nr /
granularity) % 8
+
+Here bit_nr is a number of "virtual" bit of the bitmap, which is
covered by
+"physical" bit with number (bit_nr / granularity).
+
+Dirty Bitmap Table entry:
+
+    Bit  0 -  8:    Reserved
+
+         9 - 55:    Bits 9-55 of host cluster offset. Must be
aligned to a
+                    cluster boundary. If the offset is 0, the
cluster is
+                    unallocated, and should be read as all zeros.
+
+        56 - 63:    Reserved
+
+=== Dirty Bitmap Directory ===
+
+Each dirty bitmap, saved in the image is described in the Dirty Bitmap
+Directory entry. Dirty Bitmap Directory is a contiguous area in the
image file,
+whose starting offset and length are given by the header extension
fields
+dirty_bitmap_directory_offset and dirty_bitmap_directory_size. The
entries of
+the bitmap directory have variable length, depending on the length
of the
+bitmap name.
+
+Dirty Bitmap Directory Entry:
+
+    Byte 0 -  7:    dirty_bitmap_table_offset
+                    Offset into the image file at which the Dirty
Bitmap Table
+                    for the bitmap starts. Must be aligned to a cluster
+                    boundary.
+
+         8 - 15:    nb_virtual_bits
+                    Number of "virtual" bits in the bitmap. Number of
+                    "physical" bits would be:
+                    (nb_virtual_bits + granularity - 1) / granularity
+
+        16 - 19:    dirty_bitmap_table_size
+                    Number of entries in the Dirty Bitmap Table of
the bitmap.
+                    Valid values: 0 - 0x8000000.
+                    Also, (dirty_bitmap_table_size * cluster_size)
should not
+                    be greater than 0x20000000 (512 MB)
Again, is this a qemu limitation or is there another reason? Also, you
should decide between the two limitations. The second one automatically
limits the number of values to 0 - 1048575 at maximum (512 byte
clusters).

+
+        20 - 23:    granularity_bits
+                    Granularity bits. Valid values are: 0 - 63.
+
+                    Granularity is calculated as
+                        granularity = 1 << granularity_bits
+
+                    Granularity of the bitmap is how many "virtual"
bits
+                    accounts for one "physical" bit.
+
+        24 - 27:    flags
+                    Bit
+                      0: in_use
+                         The bitmap is in use and may be inconsistent.
What does "in use" mean? You are not supposed to use a qcow2 file which
is in use by qemu anyway.

+
+                      1: self
+                         The bitmap is a dirty bitmap for the
containing image.
As I said, I don't see why we should support this ever being not set, so
I am very much in favor of dropping this.

+
+                      2: auto
+                         The bitmap should be autoloaded as block
dirty bitmap.
+                         Only available if bit 1 (self) is set.
The phrasing is too qemu-specific. Remember that this is not an
explanation for how qemu is to interpret qcow2 files, but a
*specification* of qcow2 files for *any* tool.

So if I understand the intention behind this flag, a more general
expression would be "The default bitmap". Then it is qemu's decision to
always auto-load this default bitmap.

+
+                      3: read_only
+                         The bitmap should not be rewritten.
+
+                    Bits 4 - 31 are reserved.
+
+        28 - 29:    name_size
+                    Size of the bitmap name. Valid values: 0 - 1023.
+
+        variable:   The name of the bitmap (not null terminated).
+
+        variable:   Padding to round up the Dirty Bitmap Directory
Entry size to
+                    the next multiple of 8.

The interesting thing is that I have written a huge wall of text above
and all my comments (except for "just put it into tar") can be addressed
relatively easy. Just add documentation for what dirty bitmaps are, and
a "variable: base_filename" field here, and that would be it.

But there is a reason why I'm keeping the wall of text there: I feel
like while these are very minor changes, they are fundamental design
differences. Without these changes, you just add a binary data dump
extension to qcow2, which is of no use to anyone but qemu (and not even
qemu alone, it needs the user or a management tool to tell it what to do
with it, unless the @auto flag is set).

With these changes, it suddenly actually becomes an integral part of the
qcow2 file which can be interpreted and used in a meaningful way by
tools other than qemu itself.

Max

This is actually not a very big deal from my point of view if this would
put us into agreement and will allow to proceed further. The bitmap will
be available in QCOW2, Parallels image can also have bitmap inside,
not yet code on QEMU side.

This would be enough for me for a while.

Thus the question is on John side whether "bitmap in separate file"
feature is really necessary. This is mostly API question.

Den
(NB: I never got Max's original reply, so this reply is more to Max than
to Denis or Vladimir.)

I'll see your wall of text and raise you my own wall of text...

We consider the ability to use persistent bitmaps to create incremental
backups for non-qcow2 images to be a necessary and vital component of
complete incremental backup solution, especially considering the core
mechanism of the feature does not really rely on qcow2 for anything
outside of convenience (e.g. backing files.)

Vladimir's initial proposal of using .qcow2 to store the bitmap sounded
good to me, because I wanted to be able to store the bitmap in a qcow2
file anyway, and expanding the specification to allow it to store
/arbitrary/ bitmaps seemed like a natural fit to accomplish both goals
with a minimum of coding.

However, Max has raised some pretty good points here -- allow me to
paraphrase his Wall Of Text™:

- Since this patch is a modification of the qcow2 /specification/ which
is used by more than just QEMU, we must take care to avoid QEMU-isms
limitations and design a more universal approach to the specification
addendum.

- Specific caps on the number of bitmaps, the granularity of said
bitmaps, and the resultant size of said bitmaps should be addressed in
terms of the spec, not in terms of what's necessarily convenient or
sufficient for QEMU. I think this point will be easy to address with
some better spec wording.

- The bitmap language in the spec is generic and doesn't refer much to
anything. This is partially my fault, as I believe I likely guided
Vladimir towards using generic language that was tied more to the
HBitmap format than towards our specific implementation
(BdrvDirtyBitmap.) I recognize this as a bit of a misguided effort on my
part to keep things "generic," but what I succeeded in doing was keeping
it "useless" outside of QEMU. Example: "number of virtual bits" is
meaningless, but "number of 512 byte sectors" is not.

The last bit is the crux of our problem and the most deserving of our
attention:

- For a bitmap to be useful to an application outside of QEMU, all of
the necessary information for interpreting that bitmap must either be
present within the file or referenced. For bitmaps that describe the
file they are stored in, this is trivial with some specification editing.

For bitmaps stored for /other/ files, this gets... trickier. What is
this a bitmap for? What does it describe? What data does it describe?

Node-names and drive names here are useless outside of QEMU and can of
course change between QEMU invocations or be shared between different
QEMU instances, so this is useless ...

We could store filenames, but networked devices and distributed
filesystems may have interesting relative pathnames that will not remain
reliable once the .qcow2 file is shuffled around or migrated, so storing
path-name references seems like a losing battle here, too. Maybe we only
have a file descriptor and no name at all -- what do we write for the
"global identifier that uniquely identifies the data we belong to"? Is
it even possible?

The only conclusion I can reach here is that storing bitmaps inside of a
.qcow2 that remain meaningful to external applications is not going to
be easily possible.

Perhaps we need to abandon the idea that we can store any bitmap we want
into a .qcow2.

However, I'm still a big fan of storing bitmaps that describe the data
they go alongside in the same .qcow2 as a convenience feature --
especially now since Vladimir has done the hard work for us all writing
the feature.

For simple use cases in non-managed environments the use case for
storing the bitmap inside the qcow2 it describes is pretty compelling:

- No extra files to track or manage
- The command line used to boot QEMU the first time can be used to boot
QEMU subsequently, and we get the persistent bitmap automatically
without further modification.
- Migration across a shared medium using .qcow2 files is trivial
- Backups managed by qcow2-unaware applications trivially bring along
our persistent bitmap data for us without additional configuration.

It's simple, the data is meaningful to external applications, and we've
got most of the code we need already, thanks to Vladimir.

Sadly, we still need a way to store bitmap data for files that do not
offer .bitmap_load and .bitmap_store primitives for us.

Presumably, if we devise our own "generic bitmap container" format, we
don't have to store things like node names, filenames, etc in this
container and we can use it to just store (name, granularity, size,
[data]) like we were trying to do in qcow2.

Matching bitmap IDs up to the data they belong to becomes the
responsibility of the user/management layer.

Where this gets hairy, perhaps, is how to enforce that the drive data
that belongs to this bitmap isn't modified without our say-so? How do we
detect de-sync? The quick answer might be to store a hash alongside the
bitmap, and upon being re-applied to the drive if the hash doesn't
match, we throw an error/complain/etc -- but what about cases -- again
-- where we don't necessarily have a file we can trivially hash, like a
many-gigs-wide raw file being mounted over a networked file-system?
"external bitmaps" appear to pose a very real desync risk.

I suppose we would have had that problem anyway with the
qcow2-as-container idea.

Perhaps the best we can say is "This is kind of a dangerous feature to
use, use at your own peril!" and strongly recommend that external
bitmaps are used only in conjunction with a management interface.

That's about all of the thoughts I have on the matter currently.
Does anybody else have strong feelings on where we should go from here?

(A) Argue with Max and push for qcow2-as-container
(B) Use qcow2 for self-reference bitmaps only, use an external format
for formats that do not support .bitmap_load or .bitmap_store
(C) Forget about the qcow2 extension entirely, use only the new external
format
(D) Something else?

My vote is for (B), and if I can find a bit of consensus on that, we can
draft an internal-use specification for the file, but I am very wary of
how we will manage de-sync or if we will be able manage it at all.
("Your fault for touching this file when QEMU was not running.")

Thoughts?

--js
the better way is (A) if possible at all but we can follow (B)
if (A) is not possible at all.

At least we know what to do. Frankly speaking the only sad
really necessary to support format is raw image which does
not have obvious container to keep the bitmap.

Here are some arguments which could be valuable or may
be not valuable to Max.

We have to have a bitmap inside QCOW2 file for a reasons
listed above by John. They are really valuable. For the time
being we were able to keep a lot of binary data inside the
image and VM management was really quite simple. We have
just to copy the image from one host to another. It seems
important to me to keep this feature rolling. Thus the bitmap
will stay inside QCOW2 image.

Actually all later things are a matter of external API. Would
we allow to create image without data or not. End-users
will try to fake us with all their brains to save dirty bitmap
if bitmap based backup will become useful and if they will
not use QCOW2.

Are we stopping the train at full speed using a sheet of paper
or not, preventing to create such files or not is a real question.
Any other extra different external format will costs us a LOT of
efforts and I do not see a volunteer who will perform this job
and this is an unfortunate side of things.

From my side I am really uncomfortable to drop the work
performed by Vladimir for a lot of reasons and one of
them is time frame. We have already spent around 9 months
of work to get here. I am feeling like a farther :)

Max, do you have the force with you to drive creation of this
new format stuff?

Anyway, we all have written several really lengthy letters.
May be it would be wise to discuss things verbally somehow?

Den



reply via email to

[Prev in Thread] Current Thread [Next in Thread]