qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] Revert block-qcow2.c to kvm-72 version due to c


From: Dor Laor
Subject: Re: [Qemu-devel] [PATCH] Revert block-qcow2.c to kvm-72 version due to corruption reports
Date: Sun, 15 Feb 2009 00:23:34 +0200
User-agent: Thunderbird 2.0.0.18 (X11/20081119)

Jamie Lokier wrote:
Anthony Liguori wrote:
  
Simply reverting the qcow2 code appears to fix those problems, so it
needn't hold up cutting a release.  That's what I recommend.
      
Send some patches.
    

I did already.

Here it is again.  This should fix my bug and Marc's bug according to
his report that reverting qcow2.c fixes it.
  
Going back to kvm-72 is not good also.
First, there were qcow2 corruptions before it, they were very rare but still exist. Not long
ago we did not know even that qcow2 is the faulty.
In addition, Gleb fixed some qcow2 meta data ordering writes. We need to keep them in.
The solution is to find the real cause to the corruption.
-- Jamie


Subject: Revert block-qcow2.c to kvm-72 version due to corruption reports

This fixes two kinds of qcow2 corruption observed in kvm-83 (actually
kvm-73 and later), from three bug reports.


Bug 1: Windows 2000 guests complain of corrupt registry.

Many Windows 2000 guests which boot and runs fine in kvm-72, fail with
a blue-screen indicating file corruption errors in kvm-73 through to
kvm-83 (the latest), and succeed if we replace block-qcow2.c with the
version from kvm-72.

The blue screen appears towards the end of the boot sequence, and
shows only briefly before rebooting.  It says:

    STOP: c0000218 (Registry File Failure)
    The registry cannot load the hive (file):
    \SystemRoot\System32\Config\SOFTWARE
    or its log or alternate.
    It is corrupt, absent, or not writable.

    Beginning dump of physical memory
    Physical memory dump complete. Contact your system administrator or
    technical support [...?]

This is narrowed down to the difference in block-qcow2.c between
kvm-72 and kvm-73 (not -83).  From kvm-73 to kvm-83, there have been
more changes block-qcow2.c, but the observed corruption still occurs.

The bug isn't evident when only reading.  When using "qemu-img
convert" to convert a qcow2 file to a raw file, with broken and fixed
versions of block-qcow2.c it produces the same raw file.  Also, when
using "-snapshot" with qemu, the blue screen doesn't occur.

This bug was observed by Jamie Lokier <address@hidden> and 
confirmed for multiple Windows 2000 guests by
Marc Bevand <address@hidden>.


Bug 2: Windows 2003 guests complain of corrupt registry.

According to
http://sourceforge.net/tracker/?func=detail&atid=893831&aid=2001452&group_id=180599

Windows 2003 32-bit guests randomly spew disk corruption messages
like this:

    Windows – Registry Hive Recovered
    Registry hive (file): SOFTWARE was corrupted and it has
    been recovered. Some data might have been lost.

and

    The system cannot log on due to the following error:
    Unable to complete the requested operation because of
    either a catastrophic media failure or a data structure
    corruption on the disk.

This bug was reported by <address@hidden> and
confirmed by Marc Bevand, noting:

    kvm-73+ also causes some of my Windows 2003 guests to exhibit this
    exact registry corruption error.  [...]  This bug is also fixed by
    reverting block-qcow2.c to the version from kvm-72.

Worryingly, gerdwachs' bug report says it's for kvm-70, implying this
patch may not fix all the Windows 2003 guest corruption problems.

At least Marc says his observed problem goes away with kvm-72's qcow2.


Bug 3: Corruption of qcow2 index rendering the file unusable.

Marc Bevand writes:

    I tested kvm-81 and kvm-83 as well (can't test kvm-80 or older because
    of the qcow2 performance regression caused by the default writethrough
    caching policy) but it randomly triggers an even worse bug: the moment
    I shut down a guest by typing "quit" in the monitor, it sometimes
    overwrite the first 4kB of the disk image with mostly NUL bytes (!)
    which completely destroys it. I am familiar with the qcow2 format and
    apparently this 4kB block seems to be an L2 table with most entries
    set to zero. I have had to restore at least 6 or 7 disk images from
    backup after occurences of that bug. My intuition tells me this may be
    the qcow2 code trying to allocate a cluster to write a new L2 table,
    but not noticing the allocation failed (represented by a 0 offset),
    and writing the L2 table at that 0 offset, overwriting the qcow2
    header.

    Fortunately this bug is also fixed by running kvm-75 with
    block-qcow2.c reverted to its kvm-72 version.

    Basically qcow2 in kvm-73 or newer is completely unreliable.


Reverting block-qcow2.c to the version in kvm-72 appears to fix the
corruption symptoms reported by Marc and Jamie, although gerdwachs'
related bug is against kvm-70 so it may not fix that.

Unfortunately this reverts some optimisations, but fixing corruption
is more important until the new code is reliable.

This patch reverts block-qcow2.c in kvm-83 to the version in kvm-72,
except the "cache=writeback" default performance tweak is retained and
there's no need to define "offsetof".

Signed-Off-By: Jamie Lokier <address@hidden>


--- kvm-83-real/qemu/block-qcow2.c	2009-01-13 13:29:42.000000000 +0000
+++ kvm-83/qemu/block-qcow2.c	2009-02-13 18:51:12.000000000 +0000
@@ -52,8 +52,6 @@
 #define QCOW_CRYPT_NONE 0
 #define QCOW_CRYPT_AES  1
 
-#define QCOW_MAX_CRYPT_CLUSTERS 32
-
 /* indicate that the refcount of the referenced cluster is exactly one. */
 #define QCOW_OFLAG_COPIED     (1LL << 63)
 /* indicate that the cluster is compressed (they never have the copied flag) */
@@ -269,8 +267,7 @@
     if (!s->cluster_cache)
         goto fail;
     /* one more sector for decompressed data alignment */
-    s->cluster_data = qemu_malloc(QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size
-                                  + 512);
+    s->cluster_data = qemu_malloc(s->cluster_size + 512);
     if (!s->cluster_data)
         goto fail;
     s->cluster_cache_offset = -1;
@@ -437,7 +434,8 @@
     int new_l1_size, new_l1_size2, ret, i;
     uint64_t *new_l1_table;
     uint64_t new_l1_table_offset;
-    uint8_t data[12];
+    uint64_t data64;
+    uint32_t data32;
 
     new_l1_size = s->l1_size;
     if (min_size <= new_l1_size)
@@ -467,10 +465,13 @@
         new_l1_table[i] = be64_to_cpu(new_l1_table[i]);
 
     /* set new table */
-    cpu_to_be32w((uint32_t*)data, new_l1_size);
-    cpu_to_be64w((uint64_t*)(data + 4), new_l1_table_offset);
-    if (bdrv_pwrite(s->hd, offsetof(QCowHeader, l1_size), data,
-                sizeof(data)) != sizeof(data))
+    data64 = cpu_to_be64(new_l1_table_offset);
+    if (bdrv_pwrite(s->hd, offsetof(QCowHeader, l1_table_offset),
+                    &data64, sizeof(data64)) != sizeof(data64))
+        goto fail;
+    data32 = cpu_to_be32(new_l1_size);
+    if (bdrv_pwrite(s->hd, offsetof(QCowHeader, l1_size),
+                    &data32, sizeof(data32)) != sizeof(data32))
         goto fail;
     qemu_free(s->l1_table);
     free_clusters(bs, s->l1_table_offset, s->l1_size * sizeof(uint64_t));
@@ -483,549 +484,169 @@
     return -EIO;
 }
 
-/*
- * seek_l2_table
+/* 'allocate' is:
  *
- * seek l2_offset in the l2_cache table
- * if not found, return NULL,
- * if found,
- *   increments the l2 cache hit count of the entry,
- *   if counter overflow, divide by two all counters
- *   return the pointer to the l2 cache entry
+ * 0 not to allocate.
  *
- */
-
-static uint64_t *seek_l2_table(BDRVQcowState *s, uint64_t l2_offset)
-{
-    int i, j;
-
-    for(i = 0; i < L2_CACHE_SIZE; i++) {
-        if (l2_offset == s->l2_cache_offsets[i]) {
-            /* increment the hit count */
-            if (++s->l2_cache_counts[i] == 0xffffffff) {
-                for(j = 0; j < L2_CACHE_SIZE; j++) {
-                    s->l2_cache_counts[j] >>= 1;
-                }
-            }
-            return s->l2_cache + (i << s->l2_bits);
-        }
-    }
-    return NULL;
-}
-
-/*
- * l2_load
+ * 1 to allocate a normal cluster (for sector indexes 'n_start' to
+ * 'n_end')
  *
- * Loads a L2 table into memory. If the table is in the cache, the cache
- * is used; otherwise the L2 table is loaded from the image file.
+ * 2 to allocate a compressed cluster of size
+ * 'compressed_size'. 'compressed_size' must be > 0 and <
+ * cluster_size
  *
- * Returns a pointer to the L2 table on success, or NULL if the read from
- * the image file failed.
+ * return 0 if not allocated.
  */
-
-static uint64_t *l2_load(BlockDriverState *bs, uint64_t l2_offset)
-{
-    BDRVQcowState *s = bs->opaque;
-    int min_index;
-    uint64_t *l2_table;
-
-    /* seek if the table for the given offset is in the cache */
-
-    l2_table = seek_l2_table(s, l2_offset);
-    if (l2_table != NULL)
-        return l2_table;
-
-    /* not found: load a new entry in the least used one */
-
-    min_index = l2_cache_new_entry(bs);
-    l2_table = s->l2_cache + (min_index << s->l2_bits);
-    if (bdrv_pread(s->hd, l2_offset, l2_table, s->l2_size * sizeof(uint64_t)) !=
-        s->l2_size * sizeof(uint64_t))
-        return NULL;
-    s->l2_cache_offsets[min_index] = l2_offset;
-    s->l2_cache_counts[min_index] = 1;
-
-    return l2_table;
-}
-
-/*
- * l2_allocate
- *
- * Allocate a new l2 entry in the file. If l1_index points to an already
- * used entry in the L2 table (i.e. we are doing a copy on write for the L2
- * table) copy the contents of the old L2 table into the newly allocated one.
- * Otherwise the new table is initialized with zeros.
- *
- */
-
-static uint64_t *l2_allocate(BlockDriverState *bs, int l1_index)
-{
-    BDRVQcowState *s = bs->opaque;
-    int min_index;
-    uint64_t old_l2_offset, tmp;
-    uint64_t *l2_table, l2_offset;
-
-    old_l2_offset = s->l1_table[l1_index];
-
-    /* allocate a new l2 entry */
-
-    l2_offset = alloc_clusters(bs, s->l2_size * sizeof(uint64_t));
-
-    /* update the L1 entry */
-
-    s->l1_table[l1_index] = l2_offset | QCOW_OFLAG_COPIED;
-
-    tmp = cpu_to_be64(l2_offset | QCOW_OFLAG_COPIED);
-    if (bdrv_pwrite(s->hd, s->l1_table_offset + l1_index * sizeof(tmp),
-                    &tmp, sizeof(tmp)) != sizeof(tmp))
-        return NULL;
-
-    /* allocate a new entry in the l2 cache */
-
-    min_index = l2_cache_new_entry(bs);
-    l2_table = s->l2_cache + (min_index << s->l2_bits);
-
-    if (old_l2_offset == 0) {
-        /* if there was no old l2 table, clear the new table */
-        memset(l2_table, 0, s->l2_size * sizeof(uint64_t));
-    } else {
-        /* if there was an old l2 table, read it from the disk */
-        if (bdrv_pread(s->hd, old_l2_offset,
-                       l2_table, s->l2_size * sizeof(uint64_t)) !=
-            s->l2_size * sizeof(uint64_t))
-            return NULL;
-    }
-    /* write the l2 table to the file */
-    if (bdrv_pwrite(s->hd, l2_offset,
-                    l2_table, s->l2_size * sizeof(uint64_t)) !=
-        s->l2_size * sizeof(uint64_t))
-        return NULL;
-
-    /* update the l2 cache entry */
-
-    s->l2_cache_offsets[min_index] = l2_offset;
-    s->l2_cache_counts[min_index] = 1;
-
-    return l2_table;
-}
-
-static int size_to_clusters(BDRVQcowState *s, int64_t size)
-{
-    return (size + (s->cluster_size - 1)) >> s->cluster_bits;
-}
-
-static int count_contiguous_clusters(uint64_t nb_clusters, int cluster_size,
-        uint64_t *l2_table, uint64_t start, uint64_t mask)
-{
-    int i;
-    uint64_t offset = be64_to_cpu(l2_table[0]) & ~mask;
-
-    if (!offset)
-        return 0;
-
-    for (i = start; i < start + nb_clusters; i++)
-        if (offset + i * cluster_size != (be64_to_cpu(l2_table[i]) & ~mask))
-            break;
-
-	return (i - start);
-}
-
-static int count_contiguous_free_clusters(uint64_t nb_clusters, uint64_t *l2_table)
-{
-    int i = 0;
-
-    while(nb_clusters-- && l2_table[i] == 0)
-        i++;
-
-    return i;
-}
-
-/*
- * get_cluster_offset
- *
- * For a given offset of the disk image, return cluster offset in
- * qcow2 file.
- *
- * on entry, *num is the number of contiguous clusters we'd like to
- * access following offset.
- *
- * on exit, *num is the number of contiguous clusters we can read.
- *
- * Return 1, if the offset is found
- * Return 0, otherwise.
- *
- */
-
 static uint64_t get_cluster_offset(BlockDriverState *bs,
-                                   uint64_t offset, int *num)
-{
-    BDRVQcowState *s = bs->opaque;
-    int l1_index, l2_index;
-    uint64_t l2_offset, *l2_table, cluster_offset;
-    int l1_bits, c;
-    int index_in_cluster, nb_available, nb_needed, nb_clusters;
-
-    index_in_cluster = (offset >> 9) & (s->cluster_sectors - 1);
-    nb_needed = *num + index_in_cluster;
-
-    l1_bits = s->l2_bits + s->cluster_bits;
-
-    /* compute how many bytes there are between the offset and
-     * the end of the l1 entry
-     */
-
-    nb_available = (1 << l1_bits) - (offset & ((1 << l1_bits) - 1));
-
-    /* compute the number of available sectors */
-
-    nb_available = (nb_available >> 9) + index_in_cluster;
-
-    cluster_offset = 0;
-
-    /* seek the the l2 offset in the l1 table */
-
-    l1_index = offset >> l1_bits;
-    if (l1_index >= s->l1_size)
-        goto out;
-
-    l2_offset = s->l1_table[l1_index];
-
-    /* seek the l2 table of the given l2 offset */
-
-    if (!l2_offset)
-        goto out;
-
-    /* load the l2 table in memory */
-
-    l2_offset &= ~QCOW_OFLAG_COPIED;
-    l2_table = l2_load(bs, l2_offset);
-    if (l2_table == NULL)
-        return 0;
-
-    /* find the cluster offset for the given disk offset */
-
-    l2_index = (offset >> s->cluster_bits) & (s->l2_size - 1);
-    cluster_offset = be64_to_cpu(l2_table[l2_index]);
-    nb_clusters = size_to_clusters(s, nb_needed << 9);
-
-    if (!cluster_offset) {
-        /* how many empty clusters ? */
-        c = count_contiguous_free_clusters(nb_clusters, &l2_table[l2_index]);
-    } else {
-        /* how many allocated clusters ? */
-        c = count_contiguous_clusters(nb_clusters, s->cluster_size,
-                &l2_table[l2_index], 0, QCOW_OFLAG_COPIED);
-    }
-
-   nb_available = (c * s->cluster_sectors);
-out:
-    if (nb_available > nb_needed)
-        nb_available = nb_needed;
-
-    *num = nb_available - index_in_cluster;
-
-    return cluster_offset & ~QCOW_OFLAG_COPIED;
-}
-
-/*
- * free_any_clusters
- *
- * free clusters according to its type: compressed or not
- *
- */
-
-static void free_any_clusters(BlockDriverState *bs,
-                              uint64_t cluster_offset, int nb_clusters)
-{
-    BDRVQcowState *s = bs->opaque;
-
-    /* free the cluster */
-
-    if (cluster_offset & QCOW_OFLAG_COMPRESSED) {
-        int nb_csectors;
-        nb_csectors = ((cluster_offset >> s->csize_shift) &
-                       s->csize_mask) + 1;
-        free_clusters(bs, (cluster_offset & s->cluster_offset_mask) & ~511,
-                      nb_csectors * 512);
-        return;
-    }
-
-    free_clusters(bs, cluster_offset, nb_clusters << s->cluster_bits);
-
-    return;
-}
-
-/*
- * get_cluster_table
- *
- * for a given disk offset, load (and allocate if needed)
- * the l2 table.
- *
- * the l2 table offset in the qcow2 file and the cluster index
- * in the l2 table are given to the caller.
- *
- */
-
-static int get_cluster_table(BlockDriverState *bs, uint64_t offset,
-                             uint64_t **new_l2_table,
-                             uint64_t *new_l2_offset,
-                             int *new_l2_index)
+                                   uint64_t offset, int allocate,
+                                   int compressed_size,
+                                   int n_start, int n_end)
 {
     BDRVQcowState *s = bs->opaque;
-    int l1_index, l2_index, ret;
-    uint64_t l2_offset, *l2_table;
-
-    /* seek the the l2 offset in the l1 table */
+    int min_index, i, j, l1_index, l2_index, ret;
+    uint64_t l2_offset, *l2_table, cluster_offset, tmp, old_l2_offset;
 
     l1_index = offset >> (s->l2_bits + s->cluster_bits);
     if (l1_index >= s->l1_size) {
-        ret = grow_l1_table(bs, l1_index + 1);
-        if (ret < 0)
+        /* outside l1 table is allowed: we grow the table if needed */
+        if (!allocate)
+            return 0;
+        if (grow_l1_table(bs, l1_index + 1) < 0)
             return 0;
     }
     l2_offset = s->l1_table[l1_index];
+    if (!l2_offset) {
+        if (!allocate)
+            return 0;
+    l2_allocate:
+        old_l2_offset = l2_offset;
+        /* allocate a new l2 entry */
+        l2_offset = alloc_clusters(bs, s->l2_size * sizeof(uint64_t));
+        /* update the L1 entry */
+        s->l1_table[l1_index] = l2_offset | QCOW_OFLAG_COPIED;
+        tmp = cpu_to_be64(l2_offset | QCOW_OFLAG_COPIED);
+        if (bdrv_pwrite(s->hd, s->l1_table_offset + l1_index * sizeof(tmp),
+                        &tmp, sizeof(tmp)) != sizeof(tmp))
+            return 0;
+        min_index = l2_cache_new_entry(bs);
+        l2_table = s->l2_cache + (min_index << s->l2_bits);
 
-    /* seek the l2 table of the given l2 offset */
-
-    if (l2_offset & QCOW_OFLAG_COPIED) {
-        /* load the l2 table in memory */
-        l2_offset &= ~QCOW_OFLAG_COPIED;
-        l2_table = l2_load(bs, l2_offset);
-        if (l2_table == NULL)
+        if (old_l2_offset == 0) {
+            memset(l2_table, 0, s->l2_size * sizeof(uint64_t));
+        } else {
+            if (bdrv_pread(s->hd, old_l2_offset,
+                           l2_table, s->l2_size * sizeof(uint64_t)) !=
+                s->l2_size * sizeof(uint64_t))
+                return 0;
+        }
+        if (bdrv_pwrite(s->hd, l2_offset,
+                        l2_table, s->l2_size * sizeof(uint64_t)) !=
+            s->l2_size * sizeof(uint64_t))
             return 0;
     } else {
-        if (l2_offset)
-            free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t));
-        l2_table = l2_allocate(bs, l1_index);
-        if (l2_table == NULL)
+        if (!(l2_offset & QCOW_OFLAG_COPIED)) {
+            if (allocate) {
+                free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t));
+                goto l2_allocate;
+            }
+        } else {
+            l2_offset &= ~QCOW_OFLAG_COPIED;
+        }
+        for(i = 0; i < L2_CACHE_SIZE; i++) {
+            if (l2_offset == s->l2_cache_offsets[i]) {
+                /* increment the hit count */
+                if (++s->l2_cache_counts[i] == 0xffffffff) {
+                    for(j = 0; j < L2_CACHE_SIZE; j++) {
+                        s->l2_cache_counts[j] >>= 1;
+                    }
+                }
+                l2_table = s->l2_cache + (i << s->l2_bits);
+                goto found;
+            }
+        }
+        /* not found: load a new entry in the least used one */
+        min_index = l2_cache_new_entry(bs);
+        l2_table = s->l2_cache + (min_index << s->l2_bits);
+        if (bdrv_pread(s->hd, l2_offset, l2_table, s->l2_size * sizeof(uint64_t)) !=
+            s->l2_size * sizeof(uint64_t))
             return 0;
-        l2_offset = s->l1_table[l1_index] & ~QCOW_OFLAG_COPIED;
     }
-
-    /* find the cluster offset for the given disk offset */
-
+    s->l2_cache_offsets[min_index] = l2_offset;
+    s->l2_cache_counts[min_index] = 1;
+ found:
     l2_index = (offset >> s->cluster_bits) & (s->l2_size - 1);
-
-    *new_l2_table = l2_table;
-    *new_l2_offset = l2_offset;
-    *new_l2_index = l2_index;
-
-    return 1;
-}
-
-/*
- * alloc_compressed_cluster_offset
- *
- * For a given offset of the disk image, return cluster offset in
- * qcow2 file.
- *
- * If the offset is not found, allocate a new compressed cluster.
- *
- * Return the cluster offset if successful,
- * Return 0, otherwise.
- *
- */
-
-static uint64_t alloc_compressed_cluster_offset(BlockDriverState *bs,
-                                                uint64_t offset,
-                                                int compressed_size)
-{
-    BDRVQcowState *s = bs->opaque;
-    int l2_index, ret;
-    uint64_t l2_offset, *l2_table, cluster_offset;
-    int nb_csectors;
-
-    ret = get_cluster_table(bs, offset, &l2_table, &l2_offset, &l2_index);
-    if (ret == 0)
-        return 0;
-
     cluster_offset = be64_to_cpu(l2_table[l2_index]);
-    if (cluster_offset & QCOW_OFLAG_COPIED)
-        return cluster_offset & ~QCOW_OFLAG_COPIED;
-
-    if (cluster_offset)
-        free_any_clusters(bs, cluster_offset, 1);
-
-    cluster_offset = alloc_bytes(bs, compressed_size);
-    nb_csectors = ((cluster_offset + compressed_size - 1) >> 9) -
-                  (cluster_offset >> 9);
-
-    cluster_offset |= QCOW_OFLAG_COMPRESSED |
-                      ((uint64_t)nb_csectors << s->csize_shift);
-
-    /* update L2 table */
-
-    /* compressed clusters never have the copied flag */
-
-    l2_table[l2_index] = cpu_to_be64(cluster_offset);
-    if (bdrv_pwrite(s->hd,
-                    l2_offset + l2_index * sizeof(uint64_t),
-                    l2_table + l2_index,
-                    sizeof(uint64_t)) != sizeof(uint64_t))
-        return 0;
-
-    return cluster_offset;
-}
-
-typedef struct QCowL2Meta
-{
-    uint64_t offset;
-    int n_start;
-    int nb_available;
-    int nb_clusters;
-} QCowL2Meta;
-
-static int alloc_cluster_link_l2(BlockDriverState *bs, uint64_t cluster_offset,
-        QCowL2Meta *m)
-{
-    BDRVQcowState *s = bs->opaque;
-    int i, j = 0, l2_index, ret;
-    uint64_t *old_cluster, start_sect, l2_offset, *l2_table;
-
-    if (m->nb_clusters == 0)
-        return 0;
-
-    if (!(old_cluster = qemu_malloc(m->nb_clusters * sizeof(uint64_t))))
-        return -ENOMEM;
-
-    /* copy content of unmodified sectors */
-    start_sect = (m->offset & ~(s->cluster_size - 1)) >> 9;
-    if (m->n_start) {
-        ret = copy_sectors(bs, start_sect, cluster_offset, 0, m->n_start);
-        if (ret < 0)
-            goto err;
+    if (!cluster_offset) {
+        if (!allocate)
+            return cluster_offset;
+    } else if (!(cluster_offset & QCOW_OFLAG_COPIED)) {
+        if (!allocate)
+            return cluster_offset;
+        /* free the cluster */
+        if (cluster_offset & QCOW_OFLAG_COMPRESSED) {
+            int nb_csectors;
+            nb_csectors = ((cluster_offset >> s->csize_shift) &
+                           s->csize_mask) + 1;
+            free_clusters(bs, (cluster_offset & s->cluster_offset_mask) & ~511,
+                          nb_csectors * 512);
+        } else {
+            free_clusters(bs, cluster_offset, s->cluster_size);
+        }
+    } else {
+        cluster_offset &= ~QCOW_OFLAG_COPIED;
+        return cluster_offset;
     }
-
-    if (m->nb_available & (s->cluster_sectors - 1)) {
-        uint64_t end = m->nb_available & ~(uint64_t)(s->cluster_sectors - 1);
-        ret = copy_sectors(bs, start_sect + end, cluster_offset + (end << 9),
-                m->nb_available - end, s->cluster_sectors);
-        if (ret < 0)
-            goto err;
+    if (allocate == 1) {
+        /* allocate a new cluster */
+        cluster_offset = alloc_clusters(bs, s->cluster_size);
+
+        /* we must initialize the cluster content which won't be
+           written */
+        if ((n_end - n_start) < s->cluster_sectors) {
+            uint64_t start_sect;
+
+            start_sect = (offset & ~(s->cluster_size - 1)) >> 9;
+            ret = copy_sectors(bs, start_sect,
+                               cluster_offset, 0, n_start);
+            if (ret < 0)
+                return 0;
+            ret = copy_sectors(bs, start_sect,
+                               cluster_offset, n_end, s->cluster_sectors);
+            if (ret < 0)
+                return 0;
+        }
+        tmp = cpu_to_be64(cluster_offset | QCOW_OFLAG_COPIED);
+    } else {
+        int nb_csectors;
+        cluster_offset = alloc_bytes(bs, compressed_size);
+        nb_csectors = ((cluster_offset + compressed_size - 1) >> 9) -
+            (cluster_offset >> 9);
+        cluster_offset |= QCOW_OFLAG_COMPRESSED |
+            ((uint64_t)nb_csectors << s->csize_shift);
+        /* compressed clusters never have the copied flag */
+        tmp = cpu_to_be64(cluster_offset);
     }
-
-    ret = -EIO;
     /* update L2 table */
-    if (!get_cluster_table(bs, m->offset, &l2_table, &l2_offset, &l2_index))
-        goto err;
-
-    for (i = 0; i < m->nb_clusters; i++) {
-        if(l2_table[l2_index + i] != 0)
-            old_cluster[j++] = l2_table[l2_index + i];
-
-        l2_table[l2_index + i] = cpu_to_be64((cluster_offset +
-                    (i << s->cluster_bits)) | QCOW_OFLAG_COPIED);
-     }
-
-    if (bdrv_pwrite(s->hd, l2_offset + l2_index * sizeof(uint64_t),
-                l2_table + l2_index, m->nb_clusters * sizeof(uint64_t)) !=
-            m->nb_clusters * sizeof(uint64_t))
-        goto err;
-
-    for (i = 0; i < j; i++)
-        free_any_clusters(bs, old_cluster[i], 1);
-
-    ret = 0;
-err:
-    qemu_free(old_cluster);
-    return ret;
- }
-
-/*
- * alloc_cluster_offset
- *
- * For a given offset of the disk image, return cluster offset in
- * qcow2 file.
- *
- * If the offset is not found, allocate a new cluster.
- *
- * Return the cluster offset if successful,
- * Return 0, otherwise.
- *
- */
-
-static uint64_t alloc_cluster_offset(BlockDriverState *bs,
-                                     uint64_t offset,
-                                     int n_start, int n_end,
-                                     int *num, QCowL2Meta *m)
-{
-    BDRVQcowState *s = bs->opaque;
-    int l2_index, ret;
-    uint64_t l2_offset, *l2_table, cluster_offset;
-    int nb_clusters, i = 0;
-
-    ret = get_cluster_table(bs, offset, &l2_table, &l2_offset, &l2_index);
-    if (ret == 0)
+    l2_table[l2_index] = tmp;
+    if (bdrv_pwrite(s->hd,
+                    l2_offset + l2_index * sizeof(tmp), &tmp, sizeof(tmp)) != sizeof(tmp))
         return 0;
-
-    nb_clusters = size_to_clusters(s, n_end << 9);
-
-    nb_clusters = MIN(nb_clusters, s->l2_size - l2_index);
-
-    cluster_offset = be64_to_cpu(l2_table[l2_index]);
-
-    /* We keep all QCOW_OFLAG_COPIED clusters */
-
-    if (cluster_offset & QCOW_OFLAG_COPIED) {
-        nb_clusters = count_contiguous_clusters(nb_clusters, s->cluster_size,
-                &l2_table[l2_index], 0, 0);
-
-        cluster_offset &= ~QCOW_OFLAG_COPIED;
-        m->nb_clusters = 0;
-
-        goto out;
-    }
-
-    /* for the moment, multiple compressed clusters are not managed */
-
-    if (cluster_offset & QCOW_OFLAG_COMPRESSED)
-        nb_clusters = 1;
-
-    /* how many available clusters ? */
-
-    while (i < nb_clusters) {
-        i += count_contiguous_clusters(nb_clusters - i, s->cluster_size,
-                &l2_table[l2_index], i, 0);
-
-        if(be64_to_cpu(l2_table[l2_index + i]))
-            break;
-
-        i += count_contiguous_free_clusters(nb_clusters - i,
-                &l2_table[l2_index + i]);
-
-        cluster_offset = be64_to_cpu(l2_table[l2_index + i]);
-
-        if ((cluster_offset & QCOW_OFLAG_COPIED) ||
-                (cluster_offset & QCOW_OFLAG_COMPRESSED))
-            break;
-    }
-    nb_clusters = i;
-
-    /* allocate a new cluster */
-
-    cluster_offset = alloc_clusters(bs, nb_clusters * s->cluster_size);
-
-    /* save info needed for meta data update */
-    m->offset = offset;
-    m->n_start = n_start;
-    m->nb_clusters = nb_clusters;
-
-out:
-    m->nb_available = MIN(nb_clusters << (s->cluster_bits - 9), n_end);
-
-    *num = m->nb_available - n_start;
-
     return cluster_offset;
 }
 
 static int qcow_is_allocated(BlockDriverState *bs, int64_t sector_num,
                              int nb_sectors, int *pnum)
 {
+    BDRVQcowState *s = bs->opaque;
+    int index_in_cluster, n;
     uint64_t cluster_offset;
 
-    *pnum = nb_sectors;
-    cluster_offset = get_cluster_offset(bs, sector_num << 9, pnum);
-
+    cluster_offset = get_cluster_offset(bs, sector_num << 9, 0, 0, 0, 0);
+    index_in_cluster = sector_num & (s->cluster_sectors - 1);
+    n = s->cluster_sectors - index_in_cluster;
+    if (n > nb_sectors)
+        n = nb_sectors;
+    *pnum = n;
     return (cluster_offset != 0);
 }
 
@@ -1102,9 +723,11 @@
     uint64_t cluster_offset;
 
     while (nb_sectors > 0) {
-        n = nb_sectors;
-        cluster_offset = get_cluster_offset(bs, sector_num << 9, &n);
+        cluster_offset = get_cluster_offset(bs, sector_num << 9, 0, 0, 0, 0);
         index_in_cluster = sector_num & (s->cluster_sectors - 1);
+        n = s->cluster_sectors - index_in_cluster;
+        if (n > nb_sectors)
+            n = nb_sectors;
         if (!cluster_offset) {
             if (bs->backing_hd) {
                 /* read from the base image */
@@ -1143,18 +766,15 @@
     BDRVQcowState *s = bs->opaque;
     int ret, index_in_cluster, n;
     uint64_t cluster_offset;
-    int n_end;
-    QCowL2Meta l2meta;
 
     while (nb_sectors > 0) {
         index_in_cluster = sector_num & (s->cluster_sectors - 1);
-        n_end = index_in_cluster + nb_sectors;
-        if (s->crypt_method &&
-            n_end > QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors)
-            n_end = QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors;
-        cluster_offset = alloc_cluster_offset(bs, sector_num << 9,
-                                              index_in_cluster,
-                                              n_end, &n, &l2meta);
+        n = s->cluster_sectors - index_in_cluster;
+        if (n > nb_sectors)
+            n = nb_sectors;
+        cluster_offset = get_cluster_offset(bs, sector_num << 9, 1, 0,
+                                            index_in_cluster,
+                                            index_in_cluster + n);
         if (!cluster_offset)
             return -1;
         if (s->crypt_method) {
@@ -1165,10 +785,8 @@
         } else {
             ret = bdrv_pwrite(s->hd, cluster_offset + index_in_cluster * 512, buf, n * 512);
         }
-        if (ret != n * 512 || alloc_cluster_link_l2(bs, cluster_offset, &l2meta) < 0) {
-            free_any_clusters(bs, cluster_offset, l2meta.nb_clusters);
+        if (ret != n * 512)
             return -1;
-        }
         nb_sectors -= n;
         sector_num += n;
         buf += n * 512;
@@ -1186,33 +804,8 @@
     uint64_t cluster_offset;
     uint8_t *cluster_data;
     BlockDriverAIOCB *hd_aiocb;
-    QEMUBH *bh;
-    QCowL2Meta l2meta;
 } QCowAIOCB;
 
-static void qcow_aio_read_cb(void *opaque, int ret);
-static void qcow_aio_read_bh(void *opaque)
-{
-    QCowAIOCB *acb = opaque;
-    qemu_bh_delete(acb->bh);
-    acb->bh = NULL;
-    qcow_aio_read_cb(opaque, 0);
-}
-
-static int qcow_schedule_bh(QEMUBHFunc *cb, QCowAIOCB *acb)
-{
-    if (acb->bh)
-        return -EIO;
-
-    acb->bh = qemu_bh_new(cb, acb);
-    if (!acb->bh)
-        return -EIO;
-
-    qemu_bh_schedule(acb->bh);
-
-    return 0;
-}
-
 static void qcow_aio_read_cb(void *opaque, int ret)
 {
     QCowAIOCB *acb = opaque;
@@ -1222,12 +815,13 @@
 
     acb->hd_aiocb = NULL;
     if (ret < 0) {
-fail:
+    fail:
         acb->common.cb(acb->common.opaque, ret);
         qemu_aio_release(acb);
         return;
     }
 
+ redo:
     /* post process the read buffer */
     if (!acb->cluster_offset) {
         /* nothing to do */
@@ -1253,9 +847,12 @@
     }
 
     /* prepare next AIO request */
-    acb->n = acb->nb_sectors;
-    acb->cluster_offset = get_cluster_offset(bs, acb->sector_num << 9, &acb->n);
+    acb->cluster_offset = get_cluster_offset(bs, acb->sector_num << 9,
+                                             0, 0, 0, 0);
     index_in_cluster = acb->sector_num & (s->cluster_sectors - 1);
+    acb->n = s->cluster_sectors - index_in_cluster;
+    if (acb->n > acb->nb_sectors)
+        acb->n = acb->nb_sectors;
 
     if (!acb->cluster_offset) {
         if (bs->backing_hd) {
@@ -1268,16 +865,12 @@
                 if (acb->hd_aiocb == NULL)
                     goto fail;
             } else {
-                ret = qcow_schedule_bh(qcow_aio_read_bh, acb);
-                if (ret < 0)
-                    goto fail;
+                goto redo;
             }
         } else {
             /* Note: in this case, no need to wait */
             memset(acb->buf, 0, 512 * acb->n);
-            ret = qcow_schedule_bh(qcow_aio_read_bh, acb);
-            if (ret < 0)
-                goto fail;
+            goto redo;
         }
     } else if (acb->cluster_offset & QCOW_OFLAG_COMPRESSED) {
         /* add AIO support for compressed blocks ? */
@@ -1285,9 +878,7 @@
             goto fail;
         memcpy(acb->buf,
                s->cluster_cache + index_in_cluster * 512, 512 * acb->n);
-        ret = qcow_schedule_bh(qcow_aio_read_bh, acb);
-        if (ret < 0)
-            goto fail;
+        goto redo;
     } else {
         if ((acb->cluster_offset & 511) != 0) {
             ret = -EIO;
@@ -1316,7 +907,6 @@
     acb->nb_sectors = nb_sectors;
     acb->n = 0;
     acb->cluster_offset = 0;
-    acb->l2meta.nb_clusters = 0;
     return acb;
 }
 
@@ -1340,8 +930,8 @@
     BlockDriverState *bs = acb->common.bs;
     BDRVQcowState *s = bs->opaque;
     int index_in_cluster;
+    uint64_t cluster_offset;
     const uint8_t *src_buf;
-    int n_end;
 
     acb->hd_aiocb = NULL;
 
@@ -1352,11 +942,6 @@
         return;
     }
 
-    if (alloc_cluster_link_l2(bs, acb->cluster_offset, &acb->l2meta) < 0) {
-        free_any_clusters(bs, acb->cluster_offset, acb->l2meta.nb_clusters);
-        goto fail;
-    }
-
     acb->nb_sectors -= acb->n;
     acb->sector_num += acb->n;
     acb->buf += acb->n * 512;
@@ -1369,22 +954,19 @@
     }
 
     index_in_cluster = acb->sector_num & (s->cluster_sectors - 1);
-    n_end = index_in_cluster + acb->nb_sectors;
-    if (s->crypt_method &&
-        n_end > QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors)
-        n_end = QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors;
-
-    acb->cluster_offset = alloc_cluster_offset(bs, acb->sector_num << 9,
-                                          index_in_cluster,
-                                          n_end, &acb->n, &acb->l2meta);
-    if (!acb->cluster_offset || (acb->cluster_offset & 511) != 0) {
+    acb->n = s->cluster_sectors - index_in_cluster;
+    if (acb->n > acb->nb_sectors)
+        acb->n = acb->nb_sectors;
+    cluster_offset = get_cluster_offset(bs, acb->sector_num << 9, 1, 0,
+                                        index_in_cluster,
+                                        index_in_cluster + acb->n);
+    if (!cluster_offset || (cluster_offset & 511) != 0) {
         ret = -EIO;
         goto fail;
     }
     if (s->crypt_method) {
         if (!acb->cluster_data) {
-            acb->cluster_data = qemu_mallocz(QCOW_MAX_CRYPT_CLUSTERS *
-                                             s->cluster_size);
+            acb->cluster_data = qemu_mallocz(s->cluster_size);
             if (!acb->cluster_data) {
                 ret = -ENOMEM;
                 goto fail;
@@ -1397,7 +979,7 @@
         src_buf = acb->buf;
     }
     acb->hd_aiocb = bdrv_aio_write(s->hd,
-                                   (acb->cluster_offset >> 9) + index_in_cluster,
+                                   (cluster_offset >> 9) + index_in_cluster,
                                    src_buf, acb->n,
                                    qcow_aio_write_cb, acb);
     if (acb->hd_aiocb == NULL)
@@ -1571,7 +1153,7 @@
 
     memset(s->l1_table, 0, l1_length);
     if (bdrv_pwrite(s->hd, s->l1_table_offset, s->l1_table, l1_length) < 0)
-        return -1;
+	return -1;
     ret = bdrv_truncate(s->hd, s->l1_table_offset + l1_length);
     if (ret < 0)
         return ret;
@@ -1637,10 +1219,8 @@
         /* could not compress: write normal cluster */
         qcow_write(bs, sector_num, buf, s->cluster_sectors);
     } else {
-        cluster_offset = alloc_compressed_cluster_offset(bs, sector_num << 9,
-                                              out_len);
-        if (!cluster_offset)
-            return -1;
+        cluster_offset = get_cluster_offset(bs, sector_num << 9, 2,
+                                            out_len, 0, 0);
         cluster_offset &= s->cluster_offset_mask;
         if (bdrv_pwrite(s->hd, cluster_offset, out_buf, out_len) != out_len) {
             qemu_free(out_buf);
@@ -2225,19 +1805,26 @@
     BDRVQcowState *s = bs->opaque;
     int i, nb_clusters;
 
-    nb_clusters = size_to_clusters(s, size);
-retry:
-    for(i = 0; i < nb_clusters; i++) {
-        int64_t i = s->free_cluster_index++;
-        if (get_refcount(bs, i) != 0)
-            goto retry;
-    }
+    nb_clusters = (size + s->cluster_size - 1) >> s->cluster_bits;
+    for(;;) {
+        if (get_refcount(bs, s->free_cluster_index) == 0) {
+            s->free_cluster_index++;
+            for(i = 1; i < nb_clusters; i++) {
+                if (get_refcount(bs, s->free_cluster_index) != 0)
+                    goto not_found;
+                s->free_cluster_index++;
+            }
 #ifdef DEBUG_ALLOC2
-    printf("alloc_clusters: size=%lld -> %lld\n",
-            size,
-            (s->free_cluster_index - nb_clusters) << s->cluster_bits);
+            printf("alloc_clusters: size=%lld -> %lld\n",
+                   size,
+                   (s->free_cluster_index - nb_clusters) << s->cluster_bits);
 #endif
-    return (s->free_cluster_index - nb_clusters) << s->cluster_bits;
+            return (s->free_cluster_index - nb_clusters) << s->cluster_bits;
+        } else {
+        not_found:
+            s->free_cluster_index++;
+        }
+    }
 }
 
 static int64_t alloc_clusters(BlockDriverState *bs, int64_t size)
@@ -2301,7 +1888,8 @@
     int new_table_size, new_table_size2, refcount_table_clusters, i, ret;
     uint64_t *new_table;
     int64_t table_offset;
-    uint8_t data[12];
+    uint64_t data64;
+    uint32_t data32;
     int old_table_size;
     int64_t old_table_offset;
 
@@ -2340,10 +1928,13 @@
     for(i = 0; i < s->refcount_table_size; i++)
         be64_to_cpus(&new_table[i]);
 
-    cpu_to_be64w((uint64_t*)data, table_offset);
-    cpu_to_be32w((uint32_t*)(data + 8), refcount_table_clusters);
+    data64 = cpu_to_be64(table_offset);
     if (bdrv_pwrite(s->hd, offsetof(QCowHeader, refcount_table_offset),
-                    data, sizeof(data)) != sizeof(data))
+                    &data64, sizeof(data64)) != sizeof(data64))
+        goto fail;
+    data32 = cpu_to_be32(refcount_table_clusters);
+    if (bdrv_pwrite(s->hd, offsetof(QCowHeader, refcount_table_clusters),
+                    &data32, sizeof(data32)) != sizeof(data32))
         goto fail;
     qemu_free(s->refcount_table);
     old_table_offset = s->refcount_table_offset;
@@ -2572,7 +2163,7 @@
     uint16_t *refcount_table;
 
     size = bdrv_getlength(s->hd);
-    nb_clusters = size_to_clusters(s, size);
+    nb_clusters = (size + s->cluster_size - 1) >> s->cluster_bits;
     refcount_table = qemu_mallocz(nb_clusters * sizeof(uint16_t));
 
     /* header */
@@ -2624,7 +2215,7 @@
     int refcount;
 
     size = bdrv_getlength(s->hd);
-    nb_clusters = size_to_clusters(s, size);
+    nb_clusters = (size + s->cluster_size - 1) >> s->cluster_bits;
     for(k = 0; k < nb_clusters;) {
         k1 = k;
         refcount = get_refcount(bs, k);


  


reply via email to

[Prev in Thread] Current Thread [Next in Thread]