qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 01/12] qcow2: Add new overlap check functions


From: Max Reitz
Subject: Re: [Qemu-devel] [PATCH v2 01/12] qcow2: Add new overlap check functions
Date: Tue, 25 Nov 2014 12:02:19 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0

On 2014-11-24 at 16:56, Max Reitz wrote:
The existing qcow2 metadata overlap detection function used existing
structures to determine the location of the image metadata, from plain
fields such as l1_table_offset and l1_size in the BDRVQcowState, over
image structures in memory such as the L1 table for the L2 tables'
positions, or it even read the required data directly from disk for
every requested check, such as the snapshot L1 tables for the inactive
L2 tables' positions.

These new functions instead keep a dedicated structure for keeping track
of the metadata positions in memory. It consists of two parts: First,
there is one structure which is basically a list of all metadata
structures. Each entry has a bitmask of types (because some metadata
structures may actually overlap, such as active and inactive L2 tables),
a number of clusters occupied and the offset from the previous entry in
clusters. This structure requires relatively few memory, but checking a
certain point may take relatively long. Each entry is called a
"fragment".

Therefore, there is another representation which is a bitmap, or rather
a bytemap, of metadata types. The previously described list is split
into multiple windows with each describing a constant number of clusters
(WINDOW_SIZE). If the list is to be queried or changed, the respective
window is selected in constant time and the bitmap is generated from the
fragments belonging to the window. This bitmap can then be queried in
constant time and easily be changed.

Because the bitmap representation requires more memory, it is only used
as a cache. Whenever a window is removed from the cache, the fragment
list will be rebuilt from the bitmap if the latter has been modified.
Therefore, the fragment list is only used as the background
representation to save memory, whereas the bitmap is used whenever
possible.

Regarding the size of the fragment list in memory: As one refcount block
can handle cluster_size / 2 entries and one L2 table can handle
cluster_size / 8 entries, for a qcow2 image with the standard cluster
size of 64 kB, there is a ratio of data to metadata of about 1/6000
(1/32768 refblocks and 1/8192 L2 tables) if one ignores the fact that
not every cluster requires an L2 table entry. The refcount table and the
L1 table is generally negligible. At the worst, each metadata cluster
requires its own entry in the fragment list; each entry takes up four
bytes, therefore, at the worst, the fragment list should take up (for an
image with 64 kB clusters) (4 B) / (64 kB * 6000) of the image size,
which is about 1.e-8 (i.e., 11 kB for a 1 TB image, or 11 MB for a 1 PB
image).

Signed-off-by: Max Reitz <address@hidden>
---
  block/Makefile.objs   |   3 +-
  block/qcow2-overlap.c | 404 ++++++++++++++++++++++++++++++++++++++++++++++++++
  block/qcow2.h         |  13 ++
  3 files changed, 419 insertions(+), 1 deletion(-)
  create mode 100644 block/qcow2-overlap.c

diff --git a/block/qcow2-overlap.c b/block/qcow2-overlap.c
new file mode 100644
index 0000000..9f3f2aa
--- /dev/null
+++ b/block/qcow2-overlap.c

[snip]

+
+/**
+ * Destroys the cached window bitmap. If it has been modified, the fragment 
list
+ * will be rebuilt accordingly.
+ */
+static void destroy_window_bitmap(Qcow2MetadataList *mdl,
+                                  Qcow2MetadataWindow *window)
+{
+    if (!window->bitmap) {
+        return;
+    }
+
+    if (window->bitmap_modified) {
+        int bitmap_i, fragment_i = 0;
+        QCow2MetadataOverlap current_types = 0;
+        int current_nb_clusters = 0;
+
+        /* Rebuild the fragment list; the case bitmap_i == WINDOW_SIZE is for
+         * entering the last fragment at the bitmap end */
+
+        for (bitmap_i = 0; bitmap_i <= WINDOW_SIZE; bitmap_i++) {
+            /* Qcow2MetadataFragment::nb_clusters is a uint8_t, so
+             * current_nb_clusters may not exceed 255 */
+            if (bitmap_i < WINDOW_SIZE &&
+                current_types == window->bitmap[bitmap_i] &&
+                current_nb_clusters < 255)
+            {
+                current_nb_clusters++;
+            } else {
+                if (current_types && current_nb_clusters) {
+                    if (fragment_i >= window->fragments_array_size) {
+                        window->fragments_array_size =
+                            3 * window->fragments_array_size / 2 + 1;
+
+                        /* new_nb_fragments should be small enough, and there 
is
+                         * nothing we can do on failure anyway, so do not use
+                         * g_try_renew() here */
+                        window->fragments =
+                            g_renew(Qcow2MetadataFragment, window->fragments,
+                                    window->fragments_array_size);
+                    }
+
+                    window->fragments[fragment_i++] = (Qcow2MetadataFragment){
+                        .types          = current_types,
+                        .nb_clusters    = current_nb_clusters,
+                        .relative_start = bitmap_i - current_nb_clusters,
+                    };
+                }
+
+                current_nb_clusters = 0;

Because I don't want to write a new version every time I find a single bug myself, I'll just reply here, so you know which things I know about.

This should be 1, and not 0. At this point, we have found the first cluster of this fragment, which is why this should be 1.

Max

+                if (bitmap_i < WINDOW_SIZE) {
+                    current_types = window->bitmap[bitmap_i];
+                }
+            }
+        }
+
+        window->nb_fragments = fragment_i;
+    }
+
+    g_free(window->bitmap);
+    window->bitmap = NULL;
+}




reply via email to

[Prev in Thread] Current Thread [Next in Thread]