qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 1/3] nbd: Only try to send flush/discard command


From: Kevin Wolf
Subject: Re: [Qemu-devel] [PATCH 1/3] nbd: Only try to send flush/discard commands if connected to the NBD server
Date: Fri, 26 Oct 2012 09:59:41 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120605 Thunderbird/13.0

Am 25.10.2012 19:09, schrieb Jamie Lokier:
> Kevin Wolf wrote:
>> Am 24.10.2012 16:32, schrieb Jamie Lokier:
>>> Kevin Wolf wrote:
>>>> Am 24.10.2012 14:16, schrieb Nicholas Thomas:
>>>>> On Tue, 2012-10-23 at 16:02 +0100, Jamie Lokier wrote:
>>>>>> Since the I/O _order_ before, and sometimes after, flush, is important
>>>>>> for data integrity, this needs to be maintained when I/Os are queued in
>>>>>> the disconnected state -- including those which were inflight at the
>>>>>> time disconnect was detected and then retried on reconnect.
>>>>>
>>>>> Hmm, discussing this on IRC I was told that it wasn't necessary to
>>>>> preserve order - although I forget the fine detail. Depending on the
>>>>> implementation of qemu's coroutine mutexes, operations may not actually
>>>>> be performed in order right now - it's not too easy to work out what's
>>>>> happening.
>>>>
>>>> It's possible to reorder, but it must be consistent with the order in
>>>> which completion is signalled to the guest. The semantics of flush is
>>>> that at the point that the flush completes, all writes to the disk that
>>>> already have completed successfully are stable. It doesn't say anything
>>>> about writes that are still in flight, they may or may not be flushed to
>>>> disk.
>>>
>>> I admit I wasn't thinking clearly how much ordering NBD actually
>>> guarantees (or if there's ordering the guest depends on implicitly
>>> even if it's not guaranteed in specification), and how that is related
>>> within QEMU to virtio/FUA/NCQ/TCQ/SCSI-ORDERED ordering guarantees
>>> that the guest expects for various emulated devices and their settings.
>>>
>>> The ordering (if any) needed from the NBD driver (or any backend) is
>>> going to depend on the assumptions baked into the interface between
>>> QEMU device emulation <-> backend.
>>>
>>> E.g. if every device emulation waited for all outstanding writes to
>>> complete before sending a flush, then it wouldn't matter how the
>>> backend reordered its requests, even getting the completions out of
>>> order.
>>>
>>> Is that relationship documented (and conformed to)?
>>
>> No, like so many other things in qemu it's not spelt out explicitly.
>> However, as I understand it it's the same behaviour as real hardware
>> has, so device emulation at least for the common devices doesn't have to
>> implement anything special for it. If the hardware even supports
>> parallel requests, otherwise it would automatically only have a single
>> request in flight (like IDE).
> 
> That's why I mention virtio/FUA/NCQ/TCQ/SCSI-ORDERED, which are quite
> common.
> 
> They are features of devices which support multiple parallel requests,
> but with certain ordering constraints conveyed by or expected by the
> guest, which has to be ensured when it's mapped onto a QEMU fully
> asynchronous backend.
> 
> That means they are features of the hardware which device emulations
> _do_ have to implement.  If they don't, the storage is unreliable on
> things like host power removal and virtual power removal.

Yes, device emulations that need to maintain a given order must pay
attention to wait for completion of the previous requests.

> If the backends are allowed to explicitly have no coupling between
> different request types (even flush/discard and write), and ordering
> constraints are being enforced by the order in which device emulations
> submit and wait, that's fine.
> 
> I mention this, because POSIX aio_fsync() is _not_ fully decoupled
> according to it's specification.
> 
> So it might be that some device emulations are depending on the
> semantics of aio_fsync() or the QEMU equivalent by now; and randomly
> reordering in the NBD driver in unusual circumstances (or any other
> backend), would break those semantics.

qemu AIO has always had this semantics since bdrv_aio_flush() was
introduced. It behaves the same way for image files. So I don't see any
problem with NBD making use of the same.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]