qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [Qemu-devel] [PATCH 03/17] spec: add qcow2-dirty-bitmap


From: Denis V. Lunev
Subject: Re: [Qemu-block] [Qemu-devel] [PATCH 03/17] spec: add qcow2-dirty-bitmaps specification
Date: Wed, 7 Oct 2015 22:05:41 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0

On 10/07/2015 07:47 PM, Max Reitz wrote:
On 05.09.2015 18:43, Vladimir Sementsov-Ogievskiy wrote:
Persistent dirty bitmaps will be saved into qcow2 files. It may be used
as 'internal' bitmaps (for qcow2 drives) or as 'external' bitmaps for
other drives (there may be qcow2 file with zero disk size but with
several dirty bitmaps for other drives).

Signed-off-by: Vladimir Sementsov-Ogievskiy <address@hidden>
---
  docs/specs/qcow2.txt | 127 ++++++++++++++++++++++++++++++++++++++++++++++++++-
  1 file changed, 126 insertions(+), 1 deletion(-)
Overall: I'm strongly against putting dirty bitmaps into qcow2 files, at
least not as it is envisioned by this series.


If you don't feel like reading why, and you'd rather read what I'd do if
you really, really want to put them into qcow2, files, skip ahead until
the "RANT OVER" line.


The first indication of why that is the case is that this patch does not
add any explanation to the qcow2 specification what these dirty bitmaps
are. Therefore, there are basically just binary data that is given a
name and dumped into a qcow2 file as if it were a tar file.

One could argue that this is qemu and we know what dirty bitmaps are.
But qcow2.txt is located in docs/specs/, not just in docs/. It is not an
explanation, but a *specification*, and as such it should explain
everything related to qcow2.

As a side notice, we already have a binary data dump in qcow2 files, and
that is the VM state. This is bad enough and if it would have been up to
me, it would have never been there. That's because it's something only
qemu can make use of, and not even different versions of qemu are
compatible there, so it was (in my opinion) a pretty bad idea to put it
into qcow2.

So what this specification is definitely lacking is an explanation on
how any independent program (i.e. *not qemu*) is to interpret the dirty
bitmaps. I do believe this is possible, as opposed to the VM state. The
VM state, nobody can do anything with it, it's even difficult for qemu
itself sometimes.

So let's imagine this specification would contain an explanation on what
dirty bitmaps are and what they mean. Actually, now that I think about
it, I cannot really imagine it, because I'm lacking that explanation.
What do they mean? As far as I can see from the series, they actually
don't mean anything. It's just a dump of data into a qcow2 file, and it
can be any bitmap, be it associated with the file itself or not.

This is further pointed to by your feature proposal "Allow qcow2 images
without l1_table and other staff but only with dirty bitmaps with
minimum overhead". There is a file format for exactly that, and it's
called tar (yes, you are missing some metadata, but just add a JSON
description file to the archive and you're done).

By the way: I heard John briefly touch this in his talk at KVM Forum
when he explained that this would make qcow2 files something like better
tar files, and I didn't like the idea back then either. I was hoping
that it would actually be differently, and was waiting for some
discussion to appear, but I didn't notice this series, because it
doesn't have "qcow2" in the cover letter's subject (and I wasn't CC'd,
but I don't really see why I should have been, as I'm not mentioned in
the MAINTAINERS file (what a lucky man I am!)). I only just noticed
today when I saw a lone reply from John on qemu-block to a patch with a
"qcow2:" prefix.

So, what you are apparently planning to do is to dump dirty bitmaps into
any available qcow2 file. If the image you are operating on is a qcow2
file, great! If it isn't, you create some empty qcow2 file and dump the
bitmaps there.

Then, I'm asking myself why you don't use tar files in the second case,
and then, why you don't use tar files in the first case. I do remember
John saying that there was a dicussion about it, but I don't know about
it, so I don't know why you dropped that idea in favor of making qcow2
files tar archives. The only reason I can think of off the top of my
head is that we have infrastructure for reading qcow2 files, but not for
tar files. However, this series is like just appending a tar file to a
qcow2 file, and then implementing a reader for tar archives inside of
the qcow2 driver, so it doesn't seem to be much simpler in practice.

In any case, if my assumptions so far are more or less correct, no
outside program can do anything with the dirty bitmaps contained in the
qcow2 file, because they are just binary data which does not necessarily
have any connection to the qcow2 file itself. Not even qemu can make
sense of them, it appears, it needs the user or the management tool to
do so.

I am strongly against putting binary data into a qcow2 file which does
not have any visible connection to the file's contents.

Obviously, it is possible that there is some connection which I am just
not seeing, though.



--- RANT OVER ---

Okay, that was enough destructive criticism, now to get some
constructive arguments and ideas.

So, there are two points I don't like: First, it's binary data which
isn't explained in the qcow2 specification. This can easily be fixed.

Second, there is no obvious connection between the qcow2 file and a
dirty bitmap. I'd drop the idea of "If you use anything else than qcow2,
we create an empty qcow2 file and put the dirty bitmaps there". Please
don't do that. If you are using something else and want this feature,
that's your problem. If you need features, you use qcow2. That's it. If
you really want to support it for other file formats, but the data into
tar archives and not into qcow2 files.

For comparison, this is like using a qcow2 file for implementing backing
files for raw images. The cluster offsets in L2 tables would then point
to offsets in the raw image (and the host offset would have to match the
guest offset), and by looking at which L2 table entries are unused, one
could deduce which sectors are to be read from the backing file. We
don't support that either, because you should just use qcow2 if you want
backing files.

Next we need to know for every dirty bitmap what the reference disk is.
Since generally that reference disk is stored in some image file
somewhere, I'd add a filename for each of the dirty bitmaps which is the
base file in respect to which these clusters are considered dirty.

As a measurement on how well you have done to associate a dirty bitmap
with a qcow2 file, imagine the following scenario: You are writing a
program independent of qemu, and that program is to make use of the
dirty bitmaps for incremental backups.

With my proposal above, it would open the qcow2 file and pick some
bitmap based on name, base image, user choice or maybe some property of
the bitmap itself (e.g. lowest dirty bit count). Then, it would create a
new overlay file (the backup image), let's say a qcow2 file, and use the
base image filename of the selected dirty bitmap as the filename of the
backing file for the backup image. Then, it would copy all dirty
clusters from the original qcow2 file to the backup image, and that's it.

Right now, with this patch alone, the tool has no idea what the base
image is, and some bitmaps may not even be related to the very qcow2
file they are in at all.


With that fixed, I could be moved to accept the concept of dirty bitmaps
in qcow2 files grudgingly. Maybe happily, if you give me a good reason
why we should not put them into tar files.


And I have some other comments in regards to the specification:

diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
index 121dfc8..5fc0365 100644
--- a/docs/specs/qcow2.txt
+++ b/docs/specs/qcow2.txt
@@ -103,7 +103,13 @@ in the description of a field.
                      write to an image with unknown auto-clear features if it
                      clears the respective bits from this field first.
- Bits 0-63: Reserved (set to 0)
+                    Bit 0:      Dirty bitmaps bit. If this bit is set then
+                                there is a _consistent_ Dirty bitmaps extension
+                                in the image. If it is not set, but there is a
+                                Dirty bitmaps extension, its data should be
+                                considered as inconsistent.
+
+                    Bits 1-63:  Reserved (set to 0)
96 - 99: refcount_order
                      Describes the width of a reference count block entry 
(width
@@ -123,6 +129,7 @@ be stored. Each extension has a structure like the 
following:
                          0x00000000 - End of the header extension area
                          0xE2792ACA - Backing file format name
                          0x6803f857 - Feature name table
+                        0x23852875 - Dirty bitmaps
                          other      - Unknown header extension, can be safely
                                       ignored
@@ -166,6 +173,24 @@ the header extension data. Each entry look like this:
                      terminated if it has full length)
+== Dirty bitmaps ==
+
+Dirty bitmaps is an optional header extension. It provides an ability to store
+dirty bitmaps in a qcow2 image. The fields are:
+
+          0 -  3:  nb_dirty_bitmaps
+                   The number of dirty bitmaps contained in the image. Valid
+                   values: 0 - 65535.
Why? Because that's what qemu supports? That's not a real reason. If so,
you may make a note of that (see the cluster_bits documentation), or
just omit it; for years, qemu only supported refcount_order = 4, but the
specification did not make a note of that. It was just a limitation of
qemu, but not of the format.

+
+          4 -  7:  dirty_bitmap_directory_size
+                   Size of the Dirty Bitmap Directory in bytes. Valid values:
+                   0 - 67108864 (= 1024 * nb_dirty_bitmaps).
Same here.

+
+          8 - 15:  dirty_bitmap_directory_offset
+                   Offset into the image file at which the Dirty Bitmap
+                   Directory starts. Must be aligned to a cluster boundary.
+
+
  == Host cluster management ==
qcow2 manages the allocation of host clusters by maintaining a reference count
@@ -360,3 +385,103 @@ Snapshot table entry:
variable: Padding to round up the snapshot table entry size to the
                      next multiple of 8.
+
+
+== Dirty bitmaps ==
+
+The feature supports storing dirty bitmaps in a qcow2 image.
I think I've made my point clear enough in the huge wall of text above,
but I'll just repeat it once more: This should explain what dirty
bitmaps are and how they are to be interpreted.

+
+=== Cluster mapping ===
+
+Dirty bitmaps are stored using a ONE-level structure for the mapping of
+bitmaps to host clusters. It is called Dirty Bitmap Table.
+
+The Dirty Bitmap Table has a variable size (stored in the Dirty Bitmap
+Directory Entry) and may use multiple clusters, however it must be contiguous
+in the image file.
+
+Given an offset (in bytes) into the bitmap, the offset into the image file can
+be obtained as follows:
+
+    byte_offset =
+        dirty_bitmap_table[offset / cluster_size] + (offset % cluster_size)
+
+Taking into accout the granularity of the bitmap, an offset in bits into the
+image file can be obtained like this:
+
+    bit_offset =
+        byte_offset(bit_nr / granularity / 8) * 8 + (bit_nr / granularity) % 8
+
+Here bit_nr is a number of "virtual" bit of the bitmap, which is covered by
+"physical" bit with number (bit_nr / granularity).
+
+Dirty Bitmap Table entry:
+
+    Bit  0 -  8:    Reserved
+
+         9 - 55:    Bits 9-55 of host cluster offset. Must be aligned to a
+                    cluster boundary. If the offset is 0, the cluster is
+                    unallocated, and should be read as all zeros.
+
+        56 - 63:    Reserved
+
+=== Dirty Bitmap Directory ===
+
+Each dirty bitmap, saved in the image is described in the Dirty Bitmap
+Directory entry. Dirty Bitmap Directory is a contiguous area in the image file,
+whose starting offset and length are given by the header extension fields
+dirty_bitmap_directory_offset and dirty_bitmap_directory_size. The entries of
+the bitmap directory have variable length, depending on the length of the
+bitmap name.
+
+Dirty Bitmap Directory Entry:
+
+    Byte 0 -  7:    dirty_bitmap_table_offset
+                    Offset into the image file at which the Dirty Bitmap Table
+                    for the bitmap starts. Must be aligned to a cluster
+                    boundary.
+
+         8 - 15:    nb_virtual_bits
+                    Number of "virtual" bits in the bitmap. Number of
+                    "physical" bits would be:
+                    (nb_virtual_bits + granularity - 1) / granularity
+
+        16 - 19:    dirty_bitmap_table_size
+                    Number of entries in the Dirty Bitmap Table of the bitmap.
+                    Valid values: 0 - 0x8000000.
+                    Also, (dirty_bitmap_table_size * cluster_size) should not
+                    be greater than 0x20000000 (512 MB)
Again, is this a qemu limitation or is there another reason? Also, you
should decide between the two limitations. The second one automatically
limits the number of values to 0 - 1048575 at maximum (512 byte clusters).

+
+        20 - 23:    granularity_bits
+                    Granularity bits. Valid values are: 0 - 63.
+
+                    Granularity is calculated as
+                        granularity = 1 << granularity_bits
+
+                    Granularity of the bitmap is how many "virtual" bits
+                    accounts for one "physical" bit.
+
+        24 - 27:    flags
+                    Bit
+                      0: in_use
+                         The bitmap is in use and may be inconsistent.
What does "in use" mean? You are not supposed to use a qcow2 file which
is in use by qemu anyway.

+
+                      1: self
+                         The bitmap is a dirty bitmap for the containing image.
As I said, I don't see why we should support this ever being not set, so
I am very much in favor of dropping this.

+
+                      2: auto
+                         The bitmap should be autoloaded as block dirty bitmap.
+                         Only available if bit 1 (self) is set.
The phrasing is too qemu-specific. Remember that this is not an
explanation for how qemu is to interpret qcow2 files, but a
*specification* of qcow2 files for *any* tool.

So if I understand the intention behind this flag, a more general
expression would be "The default bitmap". Then it is qemu's decision to
always auto-load this default bitmap.

+
+                      3: read_only
+                         The bitmap should not be rewritten.
+
+                    Bits 4 - 31 are reserved.
+
+        28 - 29:    name_size
+                    Size of the bitmap name. Valid values: 0 - 1023.
+
+        variable:   The name of the bitmap (not null terminated).
+
+        variable:   Padding to round up the Dirty Bitmap Directory Entry size 
to
+                    the next multiple of 8.

The interesting thing is that I have written a huge wall of text above
and all my comments (except for "just put it into tar") can be addressed
relatively easy. Just add documentation for what dirty bitmaps are, and
a "variable: base_filename" field here, and that would be it.

But there is a reason why I'm keeping the wall of text there: I feel
like while these are very minor changes, they are fundamental design
differences. Without these changes, you just add a binary data dump
extension to qcow2, which is of no use to anyone but qemu (and not even
qemu alone, it needs the user or a management tool to tell it what to do
with it, unless the @auto flag is set).

With these changes, it suddenly actually becomes an integral part of the
qcow2 file which can be interpreted and used in a meaningful way by
tools other than qemu itself.

Max

This is actually not a very big deal from my point of view if this would
put us into agreement and will allow to proceed further. The bitmap will
be available in QCOW2, Parallels image can also have bitmap inside,
not yet code on QEMU side.

This would be enough for me for a while.

Thus the question is on John side whether "bitmap in separate file"
feature is really necessary. This is mostly API question.

Den



reply via email to

[Prev in Thread] Current Thread [Next in Thread]