qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [Qemu-devel] [PATCH v2 00/11] aio: Introduce handler ty


From: Kevin Wolf
Subject: Re: [Qemu-block] [Qemu-devel] [PATCH v2 00/11] aio: Introduce handler type to fix nested aio_poll for dataplane
Date: Mon, 14 Sep 2015 10:40:35 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

Am 14.09.2015 um 09:27 hat Fam Zheng geschrieben:
> On Fri, 09/11 14:22, Kevin Wolf wrote:
> > Am 11.09.2015 um 13:46 hat Fam Zheng geschrieben:
> > > On Fri, 09/11 12:39, Kevin Wolf wrote:
> > > > Am 29.07.2015 um 06:42 hat Fam Zheng geschrieben:
> > > > > v2: Switch to disable/enable model. [Paolo]
> > > > > 
> > > > > Most existing nested aio_poll()'s in block layer are inconsiderate of
> > > > > dispatching potential new r/w requests from ioeventfds and nbd 
> > > > > exports, which
> > > > > might result in responsiveness issues (e.g. bdrv_drain_all will not 
> > > > > return when
> > > > > new requests keep coming), or even wrong semantics (e.g. 
> > > > > qmp_transaction cannot
> > > > > enforce atomicity due to aio_poll in bdrv_drain_all).
> > > > > 
> > > > > Previous attampts to address this issue include new op blocker[1], 
> > > > > bdrv_lock[2]
> > > > > and nested AioContext (patches not posted to qemu-devel).
> > > > > 
> > > > > This approach is based on the idea proposed by Paolo Bonzini. The 
> > > > > original idea
> > > > > is introducing "aio_context_disable_client / 
> > > > > aio_context_enable_client to
> > > > > filter AioContext handlers according to the "client", e.g.
> > > > > AIO_CLIENT_DATAPLANE (ioeventfd), AIO_CLIENT_PROTOCOL, 
> > > > > AIO_CLIENT_NBD_SERVER,
> > > > > AIO_CLIENT_CONTEXT, ... Extend aio_set_{event_notifier,fd}_handler to 
> > > > > pass a
> > > > > client (type)." 
> > > > > 
> > > > > After this series, block layer aio_poll() will only process those 
> > > > > "protocol"
> > > > > fds that are used in block I/O, plus the ctx->notifier for 
> > > > > aio_notify();  other
> > > > > aio_poll()'s keep unchanged.
> > > > > 
> > > > > The biggest advantage over approaches [1] and [2] is, no change is 
> > > > > needed in
> > > > > virtio-{blk,scsi}-dataplane code, also this doesn't depend on 
> > > > > converting QMP to
> > > > > coroutines.
> > > > 
> > > > It seems that I haven't replied on the mailing list yet, even though I
> > > > think I already said this in person at KVM Forum: This series fixes only
> > > > a special case of the real problem, which is that bdrv_drain/all at a
> > > > single point doesn't make a lot of sense, but needs to be replaced by a
> > > > whole section with exclusive access, like a bdrv_drained_begin/end pair.
> > > > 
> > > > To be clear: Anything that works with types of users instead of
> > > > individual users is bound to fall short of being a complete solution. I
> > > > don't prefer partial solutions when we know there is a bigger problem.
> > > > 
> > > > This series addresses your immediate need of protecting against new data
> > > > plane requests, which it arguably achieves. The second case I always
> > > > have in mind is Berto's case where he has multiple streaming block jobs
> > > > in the same backing file chain [1].
> > > > 
> > > > This involves a bdrv_reopen() of the target BDS to make it writable, and
> > > > bdrv_reopen() uses bdrv_drain_all() so drivers don't have to cope with
> > > > running requests while reopening themselves. It can however involve
> > > > nested event loops for synchronous operations like bdrv_flush(), and if
> > > > those can process completions of block jobs, which can respond by doing
> > > > anything to the respective node, things can go wrong.
> > > 
> > > Just to get a better idea of bdrv_drained_begin/end, could you explain 
> > > how to
> > > use the pair to fix the above problem?
> > 
> > How to use it is easy part: In bdrv_reopen_multiple(), you would replace
> > the existing bdrv_drain_all() with begin and you would add the
> > corresponding end right before the return statement.
> 
> OK, so that the completion of other jobs won't happen because we only complete
> the relevant requests?
> 
> Does block_job_pause() work here?

If you find all block jobs that do something with the BDS,
block_job_pause() would work, yes. But if you go further down that road,
you end up not with a design that considers the problem, but with
special casing every single user. That doesn't feel very sustainable.

> > > > You don't solve this by adding client types (then problematic request
> > > > would be PROTOCOL in your proposal and you can never exclude that), but
> > > > you really need to have bdrv_drained_being/end pairs, where only
> > > > requests issued in between are processed and everything else waits.
> > > 
> > > What do you mean by "only requests issued in between are processed"? 
> > > Where are
> > > the requests from?
> > 
> > Generally speaking, you would have code that looks like this:
> > 
> >     bdrv_drain_begin()
> >     ...
> >     bdrv_something_synchronous()
> >     ...
> >     bdrv_drain_end()
> > 
> > You want to process everything that is necessary for completing
> > bdrv_something_synchronous(), but nothing else.
> > 
> > The trickier question is how to implement this. I know that it's much
> > easier to say that your series doesn't work than actually proposing
> > something else that works...
> 
> I see the basic idea but that is still way too obscure.  How would
> bdrv_draind_begin/end know what is necessary in completing
> bdrv_something_synchronous()?

That's not about the interface, but the implementation part, which I
tried to answer below.

> > One relatively obvious answer we found when we discussed this a while
> > back was some kind of a recursive CoRwLock (reader = in-flight request;
> > writer = drained section), but that requires obviously that you're
> > running in a coroutine if you want to do something with a drained
> > request queue.
> > 
> > I'm also not totally happy with the requirement of taking a reader lock
> > more or less everywhere. But I'm not sure yet if there is a good
> > alternative that can achieve the same.
> 
> We're basically trying to prevent new requests from being submitted, but
> couldn't this blocking be a complication itself? A CoRwLock, if any, would be
> implemented with something like a CoQueue, but considering the existing 
> CoQueue
> in BDS - throttled_reqs, aren't those what we want to keep *empty* between the
> bdrv_drain_begin/end pair?  Now we're talking about queueing requests to
> another CoQueue, which feels contradictory to me.

You always queue requests _somewhere_. If they aren't queued in the
block layer, they are in the virtio ring or elsewhere. I think the point
of bdrv_drain() is that the block drivers don't have any pending
requests; they can be queued elsewhere without hurting anyone.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]