qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"


From: Kevin Wolf
Subject: Re: [Qemu-devel] [PATCH v6 01/13] block: Add op blocker type "device IO"
Date: Wed, 27 May 2015 11:07:45 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

Am 26.05.2015 um 16:24 hat Max Reitz geschrieben:
> On 26.05.2015 16:22, Kevin Wolf wrote:
> >Am 21.05.2015 um 08:42 hat Fam Zheng geschrieben:
> >>It blocks device IO.
> >What I'm missing here is a description of the case that it protects
> >against. Blocking device I/O doesn't sound like a valid thing to want
> >per se, but it might be true that we need it as long as we don't have
> >the "real" op blockers that Jeff is working on. However, as long as you
> >don't say why you want that, I can't say whether it's true or not.
> >
> >And of course, it doesn't block anything yet after this patch, it's just
> >set or cleared without any effect. In fact, it seems that even at the
> >end of the series, there is no bdrv_op_is_blocked() call that checks for
> >BLOCK_OP_TYPE_DEVICE_IO. Is this intentional?
> 
> Probably yes, because patches 2 and 3 introduce a notifier for when
> op blockers are set up and removed. This is used by virtio-blk and
> virtio-scsi to pause and unpause device I/O, respectively.

Indeed, I missed that.

Having thought a bit more about this, I think we need to carefully check
whether op blockers are really the right tool here; and if they are, we
need to document the reason in the commit message at least.

The op blocker model as used until now has a few characteristics:

* Old model: You have a period of time between blocking and unblocking
  during which starting the corresponding operation is prohibited. The
  operation checks whether the blocker is set when it starts, and it
  fails if the operation is blocked.

  New model that Jeff is working on: You still have block/unblock (for a
  category rather than a specific operation, though), but you also have
  a start/stop pair for using the functionality. If you start an
  operation that is blocked, it still errors out. However, blocking an
  operation that is already started fails as well now.

  This series: An already running operation is interrupted when the
  blocker is set, and it continues when it is removed. You had to
  introduce new infrastructure (notifiers) because this doesn't match
  the op blockers model.

* Op blockers used to be set to protect high-level, long-running
  background operations, that are usually initiated by the user.
  bdrv_drain() is low-level and a blocking foreground operation.

* Op blockers came into effect only after monitor interactions and used
  to be on a slow path. We're now in the I/O path. It's not a problem
  with the current code per se (as you introduced notifiers, so we don't
  have to iterate the list all the time), but it broadens the scope of
  op blockers significantly and we shouldn't be doing that carelessly.

* Op blockers have an error message. It's unused for the new case.

This is the first part of what's troubling me with this series, as it
makes me doubtful if op blockers are the right tool to implement what
the commit message says (block device I/O). This is "are we doing the
thing right?"

The second part should actually come first, though: "Are we doing the
right thing?" I'm also doubtful whether blocking device I/O is even what
we should do.

Why is device I/O special compared to block job I/O or NBD server I/O?
If the answer is "because block jobs are already paused while draining"
(and probably nobody thought much about the NBD server), then chances
are that it's not the right thing.  In fact, using two different
mechanisms for pausing block jobs and pausing device I/O seems
inconsistent and wrong.

I suspect that the real solution needs to be in the block layer, though
I'm not quite sure yet what it would be like. Perhaps a function pair
like blk_stop/resume_io() that is used by bdrv_drain() callers and works
on the BlockBackend level. (The BDS level is problematic because we
don't want to stop all I/O; many callers just want to drain I/O of
_other_ callers so they can do their own I/O without having to care
about concurrency).

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]