qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 03/17] spec: add qcow2-dirty-bitmaps specificati


From: John Snow
Subject: Re: [Qemu-devel] [PATCH 03/17] spec: add qcow2-dirty-bitmaps specification
Date: Thu, 8 Oct 2015 16:28:00 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0


On 10/07/2015 03:05 PM, Denis V. Lunev wrote:
> On 10/07/2015 07:47 PM, Max Reitz wrote:
>> On 05.09.2015 18:43, Vladimir Sementsov-Ogievskiy wrote:
>>> Persistent dirty bitmaps will be saved into qcow2 files. It may be used
>>> as 'internal' bitmaps (for qcow2 drives) or as 'external' bitmaps for
>>> other drives (there may be qcow2 file with zero disk size but with
>>> several dirty bitmaps for other drives).
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <address@hidden>
>>> ---
>>>   docs/specs/qcow2.txt | 127
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++-
>>>   1 file changed, 126 insertions(+), 1 deletion(-)
>> Overall: I'm strongly against putting dirty bitmaps into qcow2 files, at
>> least not as it is envisioned by this series.
>>
>>
>> If you don't feel like reading why, and you'd rather read what I'd do if
>> you really, really want to put them into qcow2, files, skip ahead until
>> the "RANT OVER" line.
>>
>>
>> The first indication of why that is the case is that this patch does not
>> add any explanation to the qcow2 specification what these dirty bitmaps
>> are. Therefore, there are basically just binary data that is given a
>> name and dumped into a qcow2 file as if it were a tar file.
>>
>> One could argue that this is qemu and we know what dirty bitmaps are.
>> But qcow2.txt is located in docs/specs/, not just in docs/. It is not an
>> explanation, but a *specification*, and as such it should 
>> explainhttp://www.preining.info/blog/2015/10/looking-at-the-facts-sarah-sharps-crusade/
>> everything related to qcow2.
>>
>> As a side notice, we already have a binary data dump in qcow2 files, and
>> that is the VM state. This is bad enough and if it would have been up to
>> me, it would have never been there. That's because it's something only
>> qemu can make use of, and not even different versions of qemu are
>> compatible there, so it was (in my opinion) a pretty bad idea to put it
>> into qcow2.
>>
>> So what this specification is definitely lacking is an explanation on
>> how any independent program (i.e. *not qemu*) is to interpret the dirty
>> bitmaps. I do believe this is possible, as opposed to the VM state. The
>> VM state, nobody can do anything with it, it's even difficult for qemu
>> itself sometimes.
>>
>> So let's imagine this specification would contain an explanation on what
>> dirty bitmaps are and what they mean. Actually, now that I think about
>> it, I cannot really imagine it, because I'm lacking that explanation.
>> What do they mean? As far as I can see from the series, they actually
>> don't mean anything. It's just a dump of data into a qcow2 file, and it
>> can be any bitmap, be it associated with the file itself or not.
>>
>> This is further pointed to by your feature proposal "Allow qcow2 images
>> without l1_table and other staff but only with dirty bitmaps with
>> minimum overhead". There is a file format for exactly that, and it's
>> called tar (yes, you are missing some metadata, but just add a JSON
>> description file to the archive and you're done).
>>
>> By the way: I heard John briefly touch this in his talk at KVM Forum
>> when he explained that this would make qcow2 files something like better
>> tar files, and I didn't like the idea back then either. I was hoping
>> that it would actually be differently, and was waiting for some
>> discussion to appear, but I didn't notice this series, because it
>> doesn't have "qcow2" in the cover letter's subject (and I wasn't CC'd,
>> but I don't really see why I should have been, as I'm not mentioned in
>> the MAINTAINERS file (what a lucky man I am!)). I only just noticed
>> today when I saw a lone reply from John on qemu-block to a patch with a
>> "qcow2:" prefix.
>>
>> So, what you are apparently planning to do is to dump dirty bitmaps into
>> any available qcow2 file. If the image you are operating on is a qcow2
>> file, great! If it isn't, you create some empty qcow2 file and dump the
>> bitmaps there.
>>
>> Then, I'm asking myself why you don't use tar files in the second case,
>> and then, why you don't use tar files in the first case. I do remember
>> John saying that there was a dicussion about it, but I don't know about
>> it, so I don't know why you dropped that idea in favor of making qcow2
>> files tar archives. The only reason I can think of off the top of my
>> head is that we have infrastructure for reading qcow2 files, but not for
>> tar files. However, this series is like just appending a tar file to a
>> qcow2 file, and then implementing a reader for tar archives inside of
>> the qcow2 driver, so it doesn't seem to be much simpler in practice.
>>
>> In any case, if my assumptions so far are more or less correct, no
>> outside program can do anything with the dirty bitmaps contained in the
>> qcow2 file, because they are just binary data which does not necessarily
>> have any connection to the qcow2 file itself. Not even qemu can make
>> sense of them, it appears, it needs the user or the management tool to
>> do so.
>>
>> I am strongly against putting binary data into a qcow2 file which does
>> not have any visible connection to the file's contents.
>>
>> Obviously, it is possible that there is some connection which I am just
>> not seeing, though.
>>
>>
>>
>> --- RANT OVER ---
>>
>> Okay, that was enough destructive criticism, now to get some
>> constructive arguments and ideas.
>>
>> So, there are two points I don't like: First, it's binary data which
>> isn't explained in the qcow2 specification. This can easily be fixed.
>>
>> Second, there is no obvious connection between the qcow2 file and a
>> dirty bitmap. I'd drop the idea of "If you use anything else than qcow2,
>> we create an empty qcow2 file and put the dirty bitmaps there". Please
>> don't do that. If you are using something else and want this feature,
>> that's your problem. If you need features, you use qcow2. That's it. If
>> you really want to support it for other file formats, but the data into
>> tar archives and not into qcow2 files.
>>
>> For comparison, this is like using a qcow2 file for implementing backing
>> files for raw images. The cluster offsets in L2 tables would then point
>> to offsets in the raw image (and the host offset would have to match the
>> guest offset), and by looking at which L2 table entries are unused, one
>> could deduce which sectors are to be read from the backing file. We
>> don't support that either, because you should just use qcow2 if you want
>> backing files.
>>
>> Next we need to know for every dirty bitmap what the reference disk is.
>> Since generally that reference disk is stored in some image file
>> somewhere, I'd add a filename for each of the dirty bitmaps which is the
>> base file in respect to which these clusters are considered dirty.
>>
>> As a measurement on how well you have done to associate a dirty bitmap
>> with a qcow2 file, imagine the following scenario: You are writing a
>> program independent of qemu, and that program is to make use of the
>> dirty bitmaps for incremental backups.
>>
>> With my proposal above, it would open the qcow2 file and pick some
>> bitmap based on name, base image, user choice or maybe some property of
>> the bitmap itself (e.g. lowest dirty bit count). Then, it would create a
>> new overlay file (the backup image), let's say a qcow2 file, and use the
>> base image filename of the selected dirty bitmap as the filename of the
>> backing file for the backup image. Then, it would copy all dirty
>> clusters from the original qcow2 file to the backup image, and that's it.
>>
>> Right now, with this patch alone, the tool has no idea what the base
>> image is, and some bitmaps may not even be related to the very qcow2
>> file they are in at all.
>>
>>
>> With that fixed, I could be moved to accept the concept of dirty bitmaps
>> in qcow2 files grudgingly. Maybe happily, if you give me a good reason
>> why we should not put them into tar files.
>>
>>
>> And I have some other comments in regards to the specification:
>>
>>> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
>>> index 121dfc8..5fc0365 100644
>>> --- a/docs/specs/qcow2.txt
>>> +++ b/docs/specs/qcow2.txt
>>> @@ -103,7 +103,13 @@ in the description of a field.
>>>                       write to an image with unknown auto-clear
>>> features if it
>>>                       clears the respective bits from this field first.
>>>   -                    Bits 0-63:  Reserved (set to 0)
>>> +                    Bit 0:      Dirty bitmaps bit. If this bit is
>>> set then
>>> +                                there is a _consistent_ Dirty
>>> bitmaps extension
>>> +                                in the image. If it is not set, but
>>> there is a
>>> +                                Dirty bitmaps extension, its data
>>> should be
>>> +                                considered as inconsistent.
>>> +
>>> +                    Bits 1-63:  Reserved (set to 0)
>>>              96 -  99:  refcount_order
>>>                       Describes the width of a reference count block
>>> entry (width
>>> @@ -123,6 +129,7 @@ be stored. Each extension has a structure like
>>> the following:
>>>                           0x00000000 - End of the header extension area
>>>                           0xE2792ACA - Backing file format name
>>>                           0x6803f857 - Feature name table
>>> +                        0x23852875 - Dirty bitmaps
>>>                           other      - Unknown header extension, can
>>> be safely
>>>                                        ignored
>>>   @@ -166,6 +173,24 @@ the header extension data. Each entry look
>>> like this:
>>>                       terminated if it has full length)
>>>     +== Dirty bitmaps ==
>>> +
>>> +Dirty bitmaps is an optional header extension. It provides an
>>> ability to store
>>> +dirty bitmaps in a qcow2 image. The fields are:
>>> +
>>> +          0 -  3:  nb_dirty_bitmaps
>>> +                   The number of dirty bitmaps contained in the
>>> image. Valid
>>> +                   values: 0 - 65535.
>> Why? Because that's what qemu supports? That's not a real reason. If so,
>> you may make a note of that (see the cluster_bits documentation), or
>> just omit it; for years, qemu only supported refcount_order = 4, but the
>> specification did not make a note of that. It was just a limitation of
>> qemu, but not of the format.
>>
>>> +
>>> +          4 -  7:  dirty_bitmap_directory_size
>>> +                   Size of the Dirty Bitmap Directory in bytes.
>>> Valid values:
>>> +                   0 - 67108864 (= 1024 * nb_dirty_bitmaps).
>> Same here.
>>
>>> +
>>> +          8 - 15:  dirty_bitmap_directory_offset
>>> +                   Offset into the image file at which the Dirty Bitmap
>>> +                   Directory starts. Must be aligned to a cluster
>>> boundary.
>>> +
>>> +
>>>   == Host cluster management ==
>>>     qcow2 manages the allocation of host clusters by maintaining a
>>> reference count
>>> @@ -360,3 +385,103 @@ Snapshot table entry:
>>>             variable:   Padding to round up the snapshot table entry
>>> size to the
>>>                       next multiple of 8.
>>> +
>>> +
>>> +== Dirty bitmaps ==
>>> +
>>> +The feature supports storing dirty bitmaps in a qcow2 image.
>> I think I've made my point clear enough in the huge wall of text above,
>> but I'll just repeat it once more: This should explain what dirty
>> bitmaps are and how they are to be interpreted.
>>
>>> +
>>> +=== Cluster mapping ===
>>> +
>>> +Dirty bitmaps are stored using a ONE-level structure for the mapping of
>>> +bitmaps to host clusters. It is called Dirty Bitmap Table.
>>> +
>>> +The Dirty Bitmap Table has a variable size (stored in the Dirty Bitmap
>>> +Directory Entry) and may use multiple clusters, however it must be
>>> contiguous
>>> +in the image file.
>>> +
>>> +Given an offset (in bytes) into the bitmap, the offset into the
>>> image file can
>>> +be obtained as follows:
>>> +
>>> +    byte_offset =
>>> +        dirty_bitmap_table[offset / cluster_size] + (offset %
>>> cluster_size)
>>> +
>>> +Taking into accout the granularity of the bitmap, an offset in bits
>>> into the
>>> +image file can be obtained like this:
>>> +
>>> +    bit_offset =
>>> +        byte_offset(bit_nr / granularity / 8) * 8 + (bit_nr /
>>> granularity) % 8
>>> +
>>> +Here bit_nr is a number of "virtual" bit of the bitmap, which is
>>> covered by
>>> +"physical" bit with number (bit_nr / granularity).
>>> +
>>> +Dirty Bitmap Table entry:
>>> +
>>> +    Bit  0 -  8:    Reserved
>>> +
>>> +         9 - 55:    Bits 9-55 of host cluster offset. Must be
>>> aligned to a
>>> +                    cluster boundary. If the offset is 0, the
>>> cluster is
>>> +                    unallocated, and should be read as all zeros.
>>> +
>>> +        56 - 63:    Reserved
>>> +
>>> +=== Dirty Bitmap Directory ===
>>> +
>>> +Each dirty bitmap, saved in the image is described in the Dirty Bitmap
>>> +Directory entry. Dirty Bitmap Directory is a contiguous area in the
>>> image file,
>>> +whose starting offset and length are given by the header extension
>>> fields
>>> +dirty_bitmap_directory_offset and dirty_bitmap_directory_size. The
>>> entries of
>>> +the bitmap directory have variable length, depending on the length
>>> of the
>>> +bitmap name.
>>> +
>>> +Dirty Bitmap Directory Entry:
>>> +
>>> +    Byte 0 -  7:    dirty_bitmap_table_offset
>>> +                    Offset into the image file at which the Dirty
>>> Bitmap Table
>>> +                    for the bitmap starts. Must be aligned to a cluster
>>> +                    boundary.
>>> +
>>> +         8 - 15:    nb_virtual_bits
>>> +                    Number of "virtual" bits in the bitmap. Number of
>>> +                    "physical" bits would be:
>>> +                    (nb_virtual_bits + granularity - 1) / granularity
>>> +
>>> +        16 - 19:    dirty_bitmap_table_size
>>> +                    Number of entries in the Dirty Bitmap Table of
>>> the bitmap.
>>> +                    Valid values: 0 - 0x8000000.
>>> +                    Also, (dirty_bitmap_table_size * cluster_size)
>>> should not
>>> +                    be greater than 0x20000000 (512 MB)
>> Again, is this a qemu limitation or is there another reason? Also, you
>> should decide between the two limitations. The second one automatically
>> limits the number of values to 0 - 1048575 at maximum (512 byte
>> clusters).
>>
>>> +
>>> +        20 - 23:    granularity_bits
>>> +                    Granularity bits. Valid values are: 0 - 63.
>>> +
>>> +                    Granularity is calculated as
>>> +                        granularity = 1 << granularity_bits
>>> +
>>> +                    Granularity of the bitmap is how many "virtual"
>>> bits
>>> +                    accounts for one "physical" bit.
>>> +
>>> +        24 - 27:    flags
>>> +                    Bit
>>> +                      0: in_use
>>> +                         The bitmap is in use and may be inconsistent.
>> What does "in use" mean? You are not supposed to use a qcow2 file which
>> is in use by qemu anyway.
>>
>>> +
>>> +                      1: self
>>> +                         The bitmap is a dirty bitmap for the
>>> containing image.
>> As I said, I don't see why we should support this ever being not set, so
>> I am very much in favor of dropping this.
>>
>>> +
>>> +                      2: auto
>>> +                         The bitmap should be autoloaded as block
>>> dirty bitmap.
>>> +                         Only available if bit 1 (self) is set.
>> The phrasing is too qemu-specific. Remember that this is not an
>> explanation for how qemu is to interpret qcow2 files, but a
>> *specification* of qcow2 files for *any* tool.
>>
>> So if I understand the intention behind this flag, a more general
>> expression would be "The default bitmap". Then it is qemu's decision to
>> always auto-load this default bitmap.
>>
>>> +
>>> +                      3: read_only
>>> +                         The bitmap should not be rewritten.
>>> +
>>> +                    Bits 4 - 31 are reserved.
>>> +
>>> +        28 - 29:    name_size
>>> +                    Size of the bitmap name. Valid values: 0 - 1023.
>>> +
>>> +        variable:   The name of the bitmap (not null terminated).
>>> +
>>> +        variable:   Padding to round up the Dirty Bitmap Directory
>>> Entry size to
>>> +                    the next multiple of 8.
>>>
>> The interesting thing is that I have written a huge wall of text above
>> and all my comments (except for "just put it into tar") can be addressed
>> relatively easy. Just add documentation for what dirty bitmaps are, and
>> a "variable: base_filename" field here, and that would be it.
>>
>> But there is a reason why I'm keeping the wall of text there: I feel
>> like while these are very minor changes, they are fundamental design
>> differences. Without these changes, you just add a binary data dump
>> extension to qcow2, which is of no use to anyone but qemu (and not even
>> qemu alone, it needs the user or a management tool to tell it what to do
>> with it, unless the @auto flag is set).
>>
>> With these changes, it suddenly actually becomes an integral part of the
>> qcow2 file which can be interpreted and used in a meaningful way by
>> tools other than qemu itself.
>>
>> Max
>>
> This is actually not a very big deal from my point of view if this would
> put us into agreement and will allow to proceed further. The bitmap will
> be available in QCOW2, Parallels image can also have bitmap inside,
> not yet code on QEMU side.
> 
> This would be enough for me for a while.
> 
> Thus the question is on John side whether "bitmap in separate file"
> feature is really necessary. This is mostly API question.
> 
> Den

(NB: I never got Max's original reply, so this reply is more to Max than
to Denis or Vladimir.)

I'll see your wall of text and raise you my own wall of text...

We consider the ability to use persistent bitmaps to create incremental
backups for non-qcow2 images to be a necessary and vital component of
complete incremental backup solution, especially considering the core
mechanism of the feature does not really rely on qcow2 for anything
outside of convenience (e.g. backing files.)

Vladimir's initial proposal of using .qcow2 to store the bitmap sounded
good to me, because I wanted to be able to store the bitmap in a qcow2
file anyway, and expanding the specification to allow it to store
/arbitrary/ bitmaps seemed like a natural fit to accomplish both goals
with a minimum of coding.

However, Max has raised some pretty good points here -- allow me to
paraphrase his Wall Of Text™:

- Since this patch is a modification of the qcow2 /specification/ which
is used by more than just QEMU, we must take care to avoid QEMU-isms
limitations and design a more universal approach to the specification
addendum.

- Specific caps on the number of bitmaps, the granularity of said
bitmaps, and the resultant size of said bitmaps should be addressed in
terms of the spec, not in terms of what's necessarily convenient or
sufficient for QEMU. I think this point will be easy to address with
some better spec wording.

- The bitmap language in the spec is generic and doesn't refer much to
anything. This is partially my fault, as I believe I likely guided
Vladimir towards using generic language that was tied more to the
HBitmap format than towards our specific implementation
(BdrvDirtyBitmap.) I recognize this as a bit of a misguided effort on my
part to keep things "generic," but what I succeeded in doing was keeping
it "useless" outside of QEMU. Example: "number of virtual bits" is
meaningless, but "number of 512 byte sectors" is not.

The last bit is the crux of our problem and the most deserving of our
attention:

- For a bitmap to be useful to an application outside of QEMU, all of
the necessary information for interpreting that bitmap must either be
present within the file or referenced. For bitmaps that describe the
file they are stored in, this is trivial with some specification editing.

For bitmaps stored for /other/ files, this gets... trickier. What is
this a bitmap for? What does it describe? What data does it describe?

Node-names and drive names here are useless outside of QEMU and can of
course change between QEMU invocations or be shared between different
QEMU instances, so this is useless ...

We could store filenames, but networked devices and distributed
filesystems may have interesting relative pathnames that will not remain
reliable once the .qcow2 file is shuffled around or migrated, so storing
path-name references seems like a losing battle here, too. Maybe we only
have a file descriptor and no name at all -- what do we write for the
"global identifier that uniquely identifies the data we belong to"? Is
it even possible?

The only conclusion I can reach here is that storing bitmaps inside of a
.qcow2 that remain meaningful to external applications is not going to
be easily possible.

Perhaps we need to abandon the idea that we can store any bitmap we want
into a .qcow2.

However, I'm still a big fan of storing bitmaps that describe the data
they go alongside in the same .qcow2 as a convenience feature --
especially now since Vladimir has done the hard work for us all writing
the feature.

For simple use cases in non-managed environments the use case for
storing the bitmap inside the qcow2 it describes is pretty compelling:

- No extra files to track or manage
- The command line used to boot QEMU the first time can be used to boot
QEMU subsequently, and we get the persistent bitmap automatically
without further modification.
- Migration across a shared medium using .qcow2 files is trivial
- Backups managed by qcow2-unaware applications trivially bring along
our persistent bitmap data for us without additional configuration.

It's simple, the data is meaningful to external applications, and we've
got most of the code we need already, thanks to Vladimir.

Sadly, we still need a way to store bitmap data for files that do not
offer .bitmap_load and .bitmap_store primitives for us.

Presumably, if we devise our own "generic bitmap container" format, we
don't have to store things like node names, filenames, etc in this
container and we can use it to just store (name, granularity, size,
[data]) like we were trying to do in qcow2.

Matching bitmap IDs up to the data they belong to becomes the
responsibility of the user/management layer.

Where this gets hairy, perhaps, is how to enforce that the drive data
that belongs to this bitmap isn't modified without our say-so? How do we
detect de-sync? The quick answer might be to store a hash alongside the
bitmap, and upon being re-applied to the drive if the hash doesn't
match, we throw an error/complain/etc -- but what about cases -- again
-- where we don't necessarily have a file we can trivially hash, like a
many-gigs-wide raw file being mounted over a networked file-system?
"external bitmaps" appear to pose a very real desync risk.

I suppose we would have had that problem anyway with the
qcow2-as-container idea.

Perhaps the best we can say is "This is kind of a dangerous feature to
use, use at your own peril!" and strongly recommend that external
bitmaps are used only in conjunction with a management interface.

That's about all of the thoughts I have on the matter currently.
Does anybody else have strong feelings on where we should go from here?

(A) Argue with Max and push for qcow2-as-container
(B) Use qcow2 for self-reference bitmaps only, use an external format
for formats that do not support .bitmap_load or .bitmap_store
(C) Forget about the qcow2 extension entirely, use only the new external
format
(D) Something else?

My vote is for (B), and if I can find a bit of consensus on that, we can
draft an internal-use specification for the file, but I am very wary of
how we will manage de-sync or if we will be able manage it at all.
("Your fault for touching this file when QEMU was not running.")

Thoughts?

--js



reply via email to

[Prev in Thread] Current Thread [Next in Thread]