qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Qemu-devel][PATCH,RFC] Zero cluster dedup


From: Shahar Frank
Subject: RE: [Qemu-devel][PATCH,RFC] Zero cluster dedup
Date: Wed, 3 Sep 2008 06:07:26 -0700

> -----Original Message-----
> From: Kevin Wolf [mailto:address@hidden
> Sent: Wednesday, September 03, 2008 3:47 PM
> To: Shahar Frank
> Cc: address@hidden; Laurent Vivier
> Subject: Re: [Qemu-devel][PATCH,RFC] Zero cluster dedup
> 
> >>> +
> >>> +static int is_not_zero(const uint8_t *buf, int len)
> >>> +{
> >>> +    int i;
> >>> +    len >>= 2;
> >>> +    for(i = 0;i < len; i++) {
> >>> +        if (((uint32_t *)buf)[i] != 0)
> >>> +            return 1;
> >>> +    }
> >>> +    return 0;
> >>> +}
> >> Why negative logic? I'd prefer buf_is_zero(). Also, could probably
be
> > an
> >> inline function.
> >>
> >
> > You are completely right. I just took this function from qemu-img,
but
> > this is not a very good excuse... ;)
> 
> Oh, I see. Then what about moving it into a common header file instead
> of copying?

OK.

> 
>  >>> +    uint64_t cluster_offset = sector_num << 9, poffset;
> >>> +
> >>> +    DEBUG("%s 0x%llx", bs->filename, cluster_offset);
> >>> +    if (!zero_dedup || (sector_num & (s->cluster_sectors - 1)) ||
> >>> +        *nb_sectors < cluster || is_not_zero(buf,
s->cluster_size))
> >>> +        return 0;                /* cannot/should not dedup */
> >> Why do you check the value of nb_sectors? It's never used and only
an
> >> output parameter.
> >
> > This is wrong. nb_sectors is input/output parameter.
> 
> But if I'm not completely mistaken, you don't use its input value
apart
> from this check. I just double-checked this and I still don't see its
> usage as an input parameter.

I will look into that again. Maybe you are right.

> 
> >>> +
> >>> +    DEBUG("%s zero dedup 0x%llx", bs->filename, cluster_offset);
> >>> +    *nb_sectors = cluster;       /* allocate exactly one cluster
*/
> >>> +
> >>> +    if (!s->zero_cluster) {
> >>> +        if (!(poffset = alloc_clusters_noref(bs,
s->cluster_size))
> > ||
> >>> +            write_cluster(bs, sector_num, poffset, buf, cluster)
<
> > 0)
> >>> +                return 0;
> >>> +        s->zero_cluster = poffset;
> >>> +    }
> >> I'm not sure how one could avoid this best, but you are
accumulating
> >> zero clusters because you create a new one for each VM run.
> >
> > This is true. It is not so bad and the alternative as I can see it
is to
> > change the qcow2 to remember the zero_cluster. In this stage I think
it
> > is not justifying it. On the other hand, it would be very nice to
add an
> > qcow headers extensions support to add optional features. In fact my
> > initial implementation did it (in backwards compatible way), but I
> > intentionally left it out because it is a standalone issue.
> 
> Ok. What about leaving it as it is for now and just adding a TODO
comment?

OK.

> 
> >>> @@ -1315,6 +1411,13 @@ static void qcow_aio_write_cb(void *opaque,
> > int
> >> ret)
> >>>          qemu_aio_release(acb);
> >>>          return;
> >>>      }
> >>> +        acb->n = acb->nb_sectors;
> >>> +    } while ((ret = qcow_dedup(bs, acb->sector_num, acb->buf,
> > &acb->n))
> >>> 0);
> >> I'm not sure this loop is the right approach. When you get a big
write
> >> request for zeros, what you are doing is to split it up into
clusters
> >> and change and write the L2 table to disk for each single cluster.
> >>
> >> This is different from the other read/write functions which try to
use
> >> as big chunks as possible to minimize the overhead.
> >
> > If I get your point, I tend to disagree. Even though there may be
cases
> > where large write will be broken due the zero dedup - I think it
will be
> > rare, and the tradeoff of saving space should be worth it.
> 
> I don't think it will be that rare. After all, as you said in your
> initial comment, you are doing a cleanup of the complete disk. I'd
> expect that large requests will be quite common in such a case.
> 
> Please note that I don't talk about trading off disk space against
> writes here, it's about optimizing the dedup.
> 
> > For the
> > cluster granularity issue, it is meaning less for dedup (zero or
other).
> > The deduped clusters are rarely de-duped to sequential clusters and
> > anyhow a dedup write operation is just changing the meta-data of the
> > block (logical offset l2 entry + reference count of the physical
> > cluster) so no large write optimization can be done.
> 
> This is still the L2 table which has to be written each time. Say we
> have a 64k write, then you could save 15 L2 table writes if qcow_dedup
> could handle zeroing multiple contiguous clusters at once. This could
> matter for performance when wiping lots of data.

You have a good point when it comes to a sequence of zero clusters (the
L2 writes can be optimized as you wrote above). I will re-implement
this.

> 
> Laurent, do you have an opinion on this? You should know this stuff
> better than me as you did the according patches.
> 
> Kevin

Shahar




reply via email to

[Prev in Thread] Current Thread [Next in Thread]