qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 08/27] block/parallels: _co_writev callback for


From: Denis V. Lunev
Subject: Re: [Qemu-devel] [PATCH 08/27] block/parallels: _co_writev callback for Parallels format
Date: Thu, 23 Apr 2015 12:36:40 +0300
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.6.0

On 23/04/15 12:20, Stefan Hajnoczi wrote:
On Wed, Apr 22, 2015 at 04:16:38PM +0300, Denis V. Lunev wrote:
On 22/04/15 16:08, Stefan Hajnoczi wrote:
On Wed, Mar 11, 2015 at 01:28:02PM +0300, Denis V. Lunev wrote:
+static int64_t allocate_cluster(BlockDriverState *bs, int64_t sector_num)
+{
+    BDRVParallelsState *s = bs->opaque;
+    uint32_t idx, offset, tmp;
+    int64_t pos;
+    int ret;
+
+    idx = sector_num / s->tracks;
+    offset = sector_num % s->tracks;
+
+    if (idx >= s->catalog_size) {
+        return -EINVAL;
+    }
+    if (s->catalog_bitmap[idx] != 0) {
+        return (uint64_t)s->catalog_bitmap[idx] * s->off_multiplier + offset;
+    }
+
+    pos = bdrv_getlength(bs->file) >> BDRV_SECTOR_BITS;
+    bdrv_truncate(bs->file, (pos + s->tracks) << BDRV_SECTOR_BITS);
+    s->catalog_bitmap[idx] = pos / s->off_multiplier;
+
+    tmp = cpu_to_le32(s->catalog_bitmap[idx]);
+
+    ret = bdrv_pwrite_sync(bs->file,
+            sizeof(ParallelsHeader) + idx * sizeof(tmp), &tmp, sizeof(tmp));
What is the purpose of the sync?
This is necessary to preserve image consistency on crash from
my point of view. There is no check consistency at the moment.
The sync will be removed later when proper crash detection
code will be added (patches 19, 20, 21)
Let's look at possible orderings in case of failure:

A. BAT update
B. Data write

This sync enforces A, B ordering.  If we can see B, then A must also
have happened thanks to the sync.

But A, B ordering is too conservative.  Imagine B, A ordering and the
failure where we crash before A.  It means we wrote the data but never
linked it into the BAT.

What happens in that case?  We've leaked a cluster in the underlying
image file but it doesn't corrupt the visible disk from the guest
point-of-view.

Because your implementation uses truncate to extend the file size before
A, even the A, B failure case results in a leaked cluster.  So the B, A
case is not worse in any way.

Why do other image formats sync cluster allocation updates?  Because
they support backing files and in that case an A, B ordering results in
data corruption so they enforce B, A ordering (the opposite of what
you're trying to do!).

The reason why A, B ordering results in data corruption when backing
files are in use is because the guest's write request might touch only a
subset of the cluster (a couple of sectors out of the whole cluster).
So the guest needs to copy the remaining sectors from the backing file.
If there is a dangling BAT entry like in the A, B failure case, then the
guest will see a zeroed cluster instead of the contents of the backing
file.  This is a data corruption, but only if a backing file is being
used!

So the sync is not necessary, both A, B and B, A ordering work for
block/parallels.c.

Stefan
this is very interesting speculation. I have never considered the
problem from this point of view. Thank you very much for this!



reply via email to

[Prev in Thread] Current Thread [Next in Thread]