qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 08/27] block/parallels: _co_writev callback for


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [PATCH 08/27] block/parallels: _co_writev callback for Parallels format
Date: Thu, 23 Apr 2015 10:20:31 +0100
User-agent: Mutt/1.5.23 (2014-03-12)

On Wed, Apr 22, 2015 at 04:16:38PM +0300, Denis V. Lunev wrote:
> On 22/04/15 16:08, Stefan Hajnoczi wrote:
> >On Wed, Mar 11, 2015 at 01:28:02PM +0300, Denis V. Lunev wrote:
> >>+static int64_t allocate_cluster(BlockDriverState *bs, int64_t sector_num)
> >>+{
> >>+    BDRVParallelsState *s = bs->opaque;
> >>+    uint32_t idx, offset, tmp;
> >>+    int64_t pos;
> >>+    int ret;
> >>+
> >>+    idx = sector_num / s->tracks;
> >>+    offset = sector_num % s->tracks;
> >>+
> >>+    if (idx >= s->catalog_size) {
> >>+        return -EINVAL;
> >>+    }
> >>+    if (s->catalog_bitmap[idx] != 0) {
> >>+        return (uint64_t)s->catalog_bitmap[idx] * s->off_multiplier + 
> >>offset;
> >>+    }
> >>+
> >>+    pos = bdrv_getlength(bs->file) >> BDRV_SECTOR_BITS;
> >>+    bdrv_truncate(bs->file, (pos + s->tracks) << BDRV_SECTOR_BITS);
> >>+    s->catalog_bitmap[idx] = pos / s->off_multiplier;
> >>+
> >>+    tmp = cpu_to_le32(s->catalog_bitmap[idx]);
> >>+
> >>+    ret = bdrv_pwrite_sync(bs->file,
> >>+            sizeof(ParallelsHeader) + idx * sizeof(tmp), &tmp, 
> >>sizeof(tmp));
> >What is the purpose of the sync?
> This is necessary to preserve image consistency on crash from
> my point of view. There is no check consistency at the moment.
> The sync will be removed later when proper crash detection
> code will be added (patches 19, 20, 21)

Let's look at possible orderings in case of failure:

A. BAT update
B. Data write

This sync enforces A, B ordering.  If we can see B, then A must also
have happened thanks to the sync.

But A, B ordering is too conservative.  Imagine B, A ordering and the
failure where we crash before A.  It means we wrote the data but never
linked it into the BAT.

What happens in that case?  We've leaked a cluster in the underlying
image file but it doesn't corrupt the visible disk from the guest
point-of-view.

Because your implementation uses truncate to extend the file size before
A, even the A, B failure case results in a leaked cluster.  So the B, A
case is not worse in any way.

Why do other image formats sync cluster allocation updates?  Because
they support backing files and in that case an A, B ordering results in
data corruption so they enforce B, A ordering (the opposite of what
you're trying to do!).

The reason why A, B ordering results in data corruption when backing
files are in use is because the guest's write request might touch only a
subset of the cluster (a couple of sectors out of the whole cluster).
So the guest needs to copy the remaining sectors from the backing file.
If there is a dangling BAT entry like in the A, B failure case, then the
guest will see a zeroed cluster instead of the contents of the backing
file.  This is a data corruption, but only if a backing file is being
used!

So the sync is not necessary, both A, B and B, A ordering work for
block/parallels.c.

Stefan

Attachment: pgpdD5thk1KUw.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]