qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH for 2.7 v3 1/1] qcow2: improve qcow2_co_write_ze


From: Kevin Wolf
Subject: Re: [Qemu-devel] [PATCH for 2.7 v3 1/1] qcow2: improve qcow2_co_write_zeroes()
Date: Fri, 13 May 2016 18:24:49 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

Am 13.05.2016 um 18:09 hat Denis V. Lunev geschrieben:
> oops, the patch gets committed... that is unexpected but great ;)

Oops, that was not intended, I just forgot to remove it from the queue
when I realised that it's not ready yet.

Should I stage a revert or do you prefer to fix it on top?

> >>>>diff --git a/block/qcow2.c b/block/qcow2.c
> >>>>index 470734b..c2474c1 100644
> >>>>--- a/block/qcow2.c
> >>>>+++ b/block/qcow2.c
> >>>>@@ -2411,21 +2411,74 @@ finish:
> >>>>      return ret;
> >>>>  }
> >>>>+
> >>>>+static bool is_zero_cluster(BlockDriverState *bs, int64_t start)
> >>>>+{
> >>>>+    BDRVQcow2State *s = bs->opaque;
> >>>>+    int nr;
> >>>>+    BlockDriverState *file;
> >>>>+    int64_t res = bdrv_get_block_status_above(bs, NULL, start,
> >>>>+                                              s->cluster_sectors, &nr, 
> >>>>&file);
> >>>>+    return res >= 0 && ((res & BDRV_BLOCK_ZERO) || !(res & 
> >>>>BDRV_BLOCK_DATA));
> >>>Why did you add the !(res & BDRV_BLOCK_DATA) condition? This means that
> >>>all unallocated clusters return true, even if the backing file contains
> >>>non-zero data for them.
> >>this is correct. From my POW this means that this area is unallocated
> >>in the entire backing chain and thus it will be read as zeroes. Thus
> >>we could cover it with zeroes.
> >You're right that I made a mistake, I was thinking of the non-recursive
> >bdrv_get_block_status().
> >
> >However, I still think that we may not assume that !BDRV_BLOCK_DATA
> >means zero data, even though that affects only more obscure cases. We
> >have bdrv_unallocated_blocks_are_zero() to check whether the assumption
> >is true. However, bdrv_co_get_block_status() already checks this
> >internally and sets BDRV_BLOCK_ZERO in this case, so just checking
> >BDRV_BLOCK_ZERO in qcow2 should be good.
> >
> >Did you find a case where you got !DATA, but not ZERO, and assuming
> >zeroes was valid? If so, we may need to fix bdrv_co_get_block_status().
> actually we may have the following case (artificial)!:
> - assuming we do not have bdrv_has_zero_init in backing store
> - and qcow2 on top of this file
> - reading from unallocated block should return 0 (no data in both
> places), qcow2
>   layer will return 0
> 
> It looks like we will have this situation.

qcow2 sets bdi->unallocated_blocks_are_zero = true, so in this case you
should correctly get BDRV_BLOCK_ZERO from bdrv_get_block_status().

> >Hm, I see:
> >
> >     if (bs->bl.write_zeroes_alignment
> >         && num > bs->bl.write_zeroes_alignment) {
> >
> >Removing the second part should fix this, i.e. it would split a request
> >into two unaligned halves even if there is no aligned "bulk" in the
> >middle.
> >
> >I think it would match my expectations better, but maybe that's just me.
> >What do you think?
> actually the code here will not be significantly better (I presume),
> but I'll make a try

Yes, I agree that it won't make the qcow2 code significantly simpler. I
just think that it would be less surprising semantics.

> >>>>+            cl_end = sector_num + nb_sectors - s->cluster_sectors;
> >>>>+            if (!is_zero_cluster(bs, cl_end)) {
> >>>>+                return -ENOTSUP;
> >>>>+            }
> >>>>+        }
> >>>>+
> >>>>+        qemu_co_mutex_lock(&s->lock);
> >>>>+        /* We can have new write after previous check */
> >>>>+        if (!is_zero_cluster_top_locked(bs, sector_num) ||
> >>>>+                (cl_end > 0 && !is_zero_cluster_top_locked(bs, cl_end))) 
> >>>>{
> >>>>+            qemu_co_mutex_unlock(&s->lock);
> >>>>+            return -ENOTSUP;
> >>>>+        }
> >>>Just lock the mutex before the check, the possible optimisation for the
> >>>emulation case (which is slow anyway) isn't worth the additional code
> >>>complexity.
> >>bdrv_get_block_status_above(bs) takes s->lock inside. This lock is not
> >>recursive thus the code will hang. This is the problem trying to be
> >>addressed with this split of checks.
> >>
> >>May be we could make the lock recursive...
> >Maybe your version is no far from the best we can do then. It deserves a
> >comment, though, because it's not completely obvious.
> >
> >The other option that we have and that looks reasonable enough to me is
> >checking is_zero_cluster_top_locked() first and only if that returns
> >false, we check the block status of the backing chain, starting at
> >bs->backing->bs. This way we would bypass the recursive call and could
> >take the lock from the beginning. If we go that way, it deserves a
> >comment as well.
> >
> >Kevin
> OK. I'll send at least improved comments and (may be)
> removal of "&& num > bs->bl.write_zeroes_alignment"
> as follow up.

The most important part is actually fixing is_zero_cluster(), because
that's a real bug that can corrupt data in the !has_zero_init case. This
is also the reason why I would revert the patch if we don't have a fix
for this until my next pull request.

The rest is just making things a bit nicer, so follow-ups are very
welcome, but not as critical.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]