qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 1/3] nbd: Only try to send flush/discard command


From: Jamie Lokier
Subject: Re: [Qemu-devel] [PATCH 1/3] nbd: Only try to send flush/discard commands if connected to the NBD server
Date: Thu, 25 Oct 2012 18:09:34 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

Kevin Wolf wrote:
> Am 24.10.2012 16:32, schrieb Jamie Lokier:
> > Kevin Wolf wrote:
> >> Am 24.10.2012 14:16, schrieb Nicholas Thomas:
> >>> On Tue, 2012-10-23 at 16:02 +0100, Jamie Lokier wrote:
> >>>> Since the I/O _order_ before, and sometimes after, flush, is important
> >>>> for data integrity, this needs to be maintained when I/Os are queued in
> >>>> the disconnected state -- including those which were inflight at the
> >>>> time disconnect was detected and then retried on reconnect.
> >>>
> >>> Hmm, discussing this on IRC I was told that it wasn't necessary to
> >>> preserve order - although I forget the fine detail. Depending on the
> >>> implementation of qemu's coroutine mutexes, operations may not actually
> >>> be performed in order right now - it's not too easy to work out what's
> >>> happening.
> >>
> >> It's possible to reorder, but it must be consistent with the order in
> >> which completion is signalled to the guest. The semantics of flush is
> >> that at the point that the flush completes, all writes to the disk that
> >> already have completed successfully are stable. It doesn't say anything
> >> about writes that are still in flight, they may or may not be flushed to
> >> disk.
> > 
> > I admit I wasn't thinking clearly how much ordering NBD actually
> > guarantees (or if there's ordering the guest depends on implicitly
> > even if it's not guaranteed in specification), and how that is related
> > within QEMU to virtio/FUA/NCQ/TCQ/SCSI-ORDERED ordering guarantees
> > that the guest expects for various emulated devices and their settings.
> > 
> > The ordering (if any) needed from the NBD driver (or any backend) is
> > going to depend on the assumptions baked into the interface between
> > QEMU device emulation <-> backend.
> > 
> > E.g. if every device emulation waited for all outstanding writes to
> > complete before sending a flush, then it wouldn't matter how the
> > backend reordered its requests, even getting the completions out of
> > order.
> > 
> > Is that relationship documented (and conformed to)?
> 
> No, like so many other things in qemu it's not spelt out explicitly.
> However, as I understand it it's the same behaviour as real hardware
> has, so device emulation at least for the common devices doesn't have to
> implement anything special for it. If the hardware even supports
> parallel requests, otherwise it would automatically only have a single
> request in flight (like IDE).

That's why I mention virtio/FUA/NCQ/TCQ/SCSI-ORDERED, which are quite
common.

They are features of devices which support multiple parallel requests,
but with certain ordering constraints conveyed by or expected by the
guest, which has to be ensured when it's mapped onto a QEMU fully
asynchronous backend.

That means they are features of the hardware which device emulations
_do_ have to implement.  If they don't, the storage is unreliable on
things like host power removal and virtual power removal.

If the backends are allowed to explicitly have no coupling between
different request types (even flush/discard and write), and ordering
constraints are being enforced by the order in which device emulations
submit and wait, that's fine.

I mention this, because POSIX aio_fsync() is _not_ fully decoupled
according to it's specification.

So it might be that some device emulations are depending on the
semantics of aio_fsync() or the QEMU equivalent by now; and randomly
reordering in the NBD driver in unusual circumstances (or any other
backend), would break those semantics.

-- Jamie



reply via email to

[Prev in Thread] Current Thread [Next in Thread]