On Wed, Apr 22, 2015 at 04:16:38PM +0300, Denis V. Lunev wrote:
On 22/04/15 16:08, Stefan Hajnoczi wrote:
On Wed, Mar 11, 2015 at 01:28:02PM +0300, Denis V. Lunev wrote:
+static int64_t allocate_cluster(BlockDriverState *bs, int64_t sector_num)
+{
+ BDRVParallelsState *s = bs->opaque;
+ uint32_t idx, offset, tmp;
+ int64_t pos;
+ int ret;
+
+ idx = sector_num / s->tracks;
+ offset = sector_num % s->tracks;
+
+ if (idx >= s->catalog_size) {
+ return -EINVAL;
+ }
+ if (s->catalog_bitmap[idx] != 0) {
+ return (uint64_t)s->catalog_bitmap[idx] * s->off_multiplier + offset;
+ }
+
+ pos = bdrv_getlength(bs->file) >> BDRV_SECTOR_BITS;
+ bdrv_truncate(bs->file, (pos + s->tracks) << BDRV_SECTOR_BITS);
+ s->catalog_bitmap[idx] = pos / s->off_multiplier;
+
+ tmp = cpu_to_le32(s->catalog_bitmap[idx]);
+
+ ret = bdrv_pwrite_sync(bs->file,
+ sizeof(ParallelsHeader) + idx * sizeof(tmp), &tmp, sizeof(tmp));
What is the purpose of the sync?
This is necessary to preserve image consistency on crash from
my point of view. There is no check consistency at the moment.
The sync will be removed later when proper crash detection
code will be added (patches 19, 20, 21)
Let's look at possible orderings in case of failure:
A. BAT update
B. Data write
This sync enforces A, B ordering. If we can see B, then A must also
have happened thanks to the sync.
But A, B ordering is too conservative. Imagine B, A ordering and the
failure where we crash before A. It means we wrote the data but never
linked it into the BAT.
What happens in that case? We've leaked a cluster in the underlying
image file but it doesn't corrupt the visible disk from the guest
point-of-view.
Because your implementation uses truncate to extend the file size before
A, even the A, B failure case results in a leaked cluster. So the B, A
case is not worse in any way.
Why do other image formats sync cluster allocation updates? Because
they support backing files and in that case an A, B ordering results in
data corruption so they enforce B, A ordering (the opposite of what
you're trying to do!).
The reason why A, B ordering results in data corruption when backing
files are in use is because the guest's write request might touch only a
subset of the cluster (a couple of sectors out of the whole cluster).
So the guest needs to copy the remaining sectors from the backing file.
If there is a dangling BAT entry like in the A, B failure case, then the
guest will see a zeroed cluster instead of the contents of the backing
file. This is a data corruption, but only if a backing file is being
used!
So the sync is not necessary, both A, B and B, A ordering work for
block/parallels.c.
Stefan