qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for


From: Kevin Wolf
Subject: Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations
Date: Wed, 14 Mar 2012 13:37:50 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.1) Gecko/20120209 Thunderbird/10.0.1

Am 14.03.2012 13:14, schrieb Paolo Bonzini:
>> Paolo mentioned a use case as a fast way for guests to write zeros, but
>> is it really faster than a normal write when we have to emulate it by a
>> bdrv_write with a temporary buffer of zeros? 
> 
> No, of course not.
> 
>> On the other hand we have
>> the cases where discard really means "I don't care about the data any
>> more" and emulating it by writing zeros is just a waste of resources there.
>>
>> So I think we only want to advertise that discard zeroes data if we can
>> do it efficiently. This means that the format does support it, and that
>> the device is able to communicate the discard granularity (= cluster
>> size) to the guest OS.
> 
> Note that the discard granularity is only a hint, so it's really more a
> maximum suggested value than a granularity.  Outside of a cluster
> boundary the format would still have to write zeros manually.

You're talking about SCSI here, I guess? Would be one case where being
able to define sane semantics for virtio-blk would have been an
advantage... I had hoped that SCSI was already sane, but if doesn't
distinguish between "I don't care about this any more" and "I want to
have zeros here", then I'm afraid I can't call it sane any more.

We can make the conditions even stricter, i.e. allow it only if protocol
can pass through discards for unaligned requests. This wouldn't free
clusters on an image format level, but at least on a file system level.

> Also, Linux for example will only round the number of sectors down to
> the granularity, not the start sector.  Rereading the code, for SCSI we
> want to advertise a zero granularity (aka do whatever you want),
> otherwise we may get only misaligned discard requests and end up writing
> zeroes inefficiently all the time.

Does this make sense with real hardware or is it a Linux bug?

> The problem is that advertising discard_zeroes_data based on the backend
> calls for trouble as soon as you migrate between storage formats,
> filesystems or disks.

True. You would have to emulate if you migrate from a source that can
discard to zeros efficiently to a destination that can't.

In the end, I guess we'll just have to accept that we can't fix bad
semantics of ATA and SCSI, and just need to decide whether "I don't
care" or "I want to have zeros" is more common. My feeling is that "I
don't care" is the more useful operation because it can't be expressed
otherwise, but I haven't checked what guests really do.

> (BTW, if the backing file allows discard and zeroes data, efficient
> write-zeroes could be done in qcow2 by allocating a cluster and
> discarding its contents.  It's similar to how you do preallocated metadata).

Yes.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]