qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v7] spec: add qcow2 bitmaps extension specificat


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [Qemu-devel] [PATCH v7] spec: add qcow2 bitmaps extension specification
Date: Sat, 16 Jan 2016 17:06:41 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.8.0

On 15.01.2016 02:26, John Snow wrote:

On 01/14/2016 05:08 PM, Eric Blake wrote:
On 01/11/2016 06:05 AM, Vladimir Sementsov-Ogievskiy wrote:
The new feature for qcow2: storing bitmaps.

This patch adds new header extension to qcow2 - Bitmaps Extension. It
provides an ability to store virtual disk related bitmaps in a qcow2
image. For now there is only one type of such bitmaps: Dirty Tracking
Bitmap, which just tracks virtual disk changes from some moment.

Note: Only bitmaps, relative to the virtual disk, stored in qcow2 file,
should be stored in this qcow2 file. The size of each bitmap
(considering its granularity) is equal to virtual disk size.

Signed-off-by: Vladimir Sementsov-Ogievskiy <address@hidden>
---

@@ -166,6 +178,34 @@ the header extension data. Each entry look like this:
                      terminated if it has full length)
+== Bitmaps extension ==
+          0 -  3:  nb_bitmaps
+                   The number of bitmaps contained in the image. Must be
+                   greater than or equal to 1.
+
+                   Note: Qemu currently only supports up to 65535 bitmaps per
+                   image.
+
+          4 -  7:  bitmap_directory_size
+                   Size of the bitmap directory in bytes. It is the cumulative
+                   size of all (nb_bitmaps) bitmap headers.
Only 4 bytes - if we ever raise our 64k entry restriction (nb_bitmaps),
could we run into an image that has so many directory entries as to make
the directory itself spill past 4G?  But I don't think it is likely, so
I can live with your choice.

"We'll never need this!"

I hope someone in 2082 is reading this right now and is quite angry.

(But really, I can't foresee needing this many per each drive -- and if
we do, we have external storage mechanisms in development to handle such
wild cases.)

Let's change it to 8 bytes, and add 4bytes reserved field to keep 8b alignment.


+
+== Bitmaps ==
+
+As mentioned above, the bitmaps extension provides the ability to store bitmaps
+related a virtual disk. This section describes how these bitmaps are stored.
+
+Note: all bitmaps are related to the virtual disk stored in this image.
+
+=== Bitmap directory ===
+
+Each bitmap saved in the image is described in a bitmap directory entry. The
+bitmap directory is a contiguous area in the image file, whose starting offset
+and length are given by the header extension fields bitmap_directory_offset and
+bitmap_directory_size. The entries of the bitmap directory have variable
+length, depending on the length of the bitmap name and extra data. These
+entries are also called bitmap headers.
+
+Structure of a bitmap directory entry:
+
+    Byte 0 -  7:    bitmap_table_offset
+                    Offset into the image file at which the bitmap table
+                    (described below) for the bitmap starts. Must be aligned to
+                    a cluster boundary.
+
+         8 - 11:    bitmap_table_size
+                    Number of entries in the bitmap table of the bitmap.
Should this be the size in bytes, instead of the number of entries? But
For what benefit? We can calculate either from the other, and this gives
us a better resolution.

at least the entries are fixed width of 8 bytes each, so this lets you
get a bitmap table up to 32G bytes rather than just 4G in size.  (Let's
see here - if we have 32G bytes in the bitmap table, that means 4G
clusters occupied by the bitmap itself; in the worst case of 512-byte
clusters and granularity 0, that is a maximum bitmap size of 2T
describing 16T of virtual guest image; with larger cluster size and/or
larger granularity, we cover a lot more virtual guest space with less
bitmap size; so I guess we aren't too worried about running out of space?).

Yes, worst case of g=0 and cluster size of 512 bytes, we can get 2T
bitmaps describing 16T of virtual data.

"default case" of 64K clusters and 64K granularity: 256TiB bitmaps
describing ... let's see ... if my math is right, 128EiB?

We're probably fine :)

(Cue future space-person from 2159 wondering how I could have ever been
so naive. Sorry, future space-person!)

it is just like l1_size from qcow2 header.

         36 - 39:   l1_size
                    Number of entries in the active L1 table



+        20 - 23:    extra_data_size
+                    Size of type-specific extra data.
+
+                    For now, as no extra data is defined, extra_data_size is
+                    reserved and must be zero.
+
+        variable:   Type-specific extra data for the bitmap.
I'd write this as:
            variable:   extra_data
                        Type-specific extra data for the bitmap,
                        occupying extra_data_size bytes.

Ok.. then, similar should be done for the name.

About extra data and our discussion of versioning it. For now, we don't have concrete cases and requirements for this data, we are only foresee, that it may be needed in future, and waste time in assumptions about what kind of data will it be..

Will it be really type specific, i.e. bitmap type strictly define its structure? Or it will contain self-defined fields, some of them will be optional, some of them will depend on bitmap type or something like this? We have no concrete cases and strategy of using this data... We need not versioning for now, as we have no versions).

The only question, that should be answered, what should do the software, if it find extra_data of unknown format. (currently any extra data is unknown).

variants:

1) abort (or, more precisely: "image file must not be written") (like incompatible feature)
2) delete extra data (like autoclear feature)
3) ignore (like compatible feature)

We can use two bits of flags field to specify the behavior.

===========================================

Or, the other way, we should fix extra data format like this:

Extra data contains several fields of the following format:

4b: field magic
1b: flags: specifies abort/delete/ignore behavior if the field is unknown
1b: field name size - for error message if the field is unknown
2b: field size
var: field name
var: field data
var: padding

- where I've seen it...

=================

And the last option:

We can select abort behavior for now for any extra data, i.e. just remove "variable extra_data" and make field extra_data_size 'reserved must be zero'.

================


So, lets finally choose something and go on.


Note: for ways 1 and 3, we should in future define extra data in the way that gives a software clear method to understand, is this extra data known or unknown for it. I.e. it should have some magic number, defining format, or version, or structure like in way 2.


+
+        variable:   The name of the bitmap (not null terminated). Must be
+                    unique among all bitmap names within the bitmaps extension.
+
+        variable:   Padding to round up the bitmap directory entry size to the
+                    next multiple of 8.
Should we require the padding to be all NUL bytes?  (We aren't
consistent on whether we require that for other locations of padding in
the spec, so that could be a followup patch).

Let's require this, I do not see any shortcomings of this, only benefits.


+
+=== Bitmap table ===
+
+Bitmaps are stored using a one-level structure (as opposed to two-level
+structure like for refcounts and guest clusters mapping) for the mapping of
+bitmap data to host clusters. This structure is called the bitmap table.
Possible wording tweak:
Bitmaps are stored using a one-level structure (as opposed to the
two-level structures for refcounts and guest cluster mapping), and are
used for the mapping of bitmap data to host clusters

No, this is wrong.. Who are used? Bitmaps are stored and are used? No, not bitmaps, structures are used.


+
+Each bitmap table has a variable size (stored in the bitmap directory Entry)
Does 'Entry' still need to be capitalized?

It's a mistake, thanks.


+and may use multiple clusters, however, it must be contiguous in the image
+file.
+
+Structure of a bitmap table entry:
+
+    Bit       0:    Reserved and must be zero if bits 9 - 55 are non-zero.
+                    If bits 9 - 55 are zero:
+                      0: Cluster should be read as all zeros.
+                      1: Cluster should be read as all ones.
+
+         1 -  8:    Reserved and must be zero.
+
+         9 - 55:    Bits 9 - 55 of the host cluster offset. Must be aligned to
+                    a cluster boundary. If the offset is 0, the cluster is
+                    unallocated; in that case, bit 0 determines how this
+                    cluster should be treated when read from.
Possible wording tweak:
s/when read from/during reads/.

ok


+
+        56 - 63:    Reserved and must be zero.
+
+=== Bitmap data ===
+
+As noted above, bitmap data is stored in separate clusters, described by the
+bitmap table. Given an offset (in bytes) into the bitmap data, the offset into
+the image file can be obtained as follows:
+
+    image_offset =
+        bitmap_table[bitmap_data_offset / cluster_size] +
+            (bitmap_data_offset % cluster_size)
+
+This offset is not defined if bits 9 - 55 of bitmap table entry are zero (see
+above).
+
+Given an offset byte_nr into the virtual disk and the bitmap's granularity, the
+bit offset into the bitmap can be calculated like this:
+
+    bit_offset =
+        image_offset(byte_nr / granularity / 8) * 8 +
+            (byte_nr / granularity) % 8
+
+If the size of the bitmap data is not a multiply of cluster size then the last
s/multiply of cluster size/multiple of the cluster size,/

ok, thanks


+cluster of the bitmap data contains some unused tail bits. These bits must be
+zero.

Thanks!

--js

Please, let's decide finally about extra data, than I'll reroll it and, I hope, it will be committed, to make it possible to continue work on persistence series. About extra data, I'm ready to accept any variant, strictly defining, what software should do with unknown extra data.


--
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]