[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del()
From: |
Ryan Harper |
Subject: |
Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del() |
Date: |
Thu, 11 Nov 2010 07:25:16 -0600 |
User-agent: |
Mutt/1.5.6+20040907i |
* Markus Armbruster <address@hidden> [2010-11-11 04:48]:
> Ryan Harper <address@hidden> writes:
>
> > * Markus Armbruster <address@hidden> [2010-11-10 11:40]:
> >> Ryan Harper <address@hidden> writes:
> >>
> >> > * Markus Armbruster <address@hidden> [2010-11-10 06:48]:
> >> >> One real question, and a couple of nits.
> >> >>
> >> >> Ryan Harper <address@hidden> writes:
> >> >>
> >> >> > Block hot unplug is racy since the guest is required to acknowlege
> >> >> > the ACPI
> >> >> > unplug event; this may not happen synchronously with the device
> >> >> > removal command
> >> >>
> >> >> Well, I wouldn't call unplug "racy". It just takes an unpredictable
> >> >> length of time, possibly forever. To make a race, you need to throw in
> >> >> a client assuming (incorrectly) that unplug is instantaneous, as
> >> >> described in your next paragraph.
> >> >>
> >> >> Moreover, all PCI unplug is that way, not just block.
> >> >>
> >> >> > This series aims to close a gap where by mgmt applications that
> >> >> > assume the
> >> >> > block resource has been removed without confirming that the guest has
> >> >> > acknowledged the removal may re-assign the underlying device to a
> >> >> > second guest
> >> >> > leading to data leakage.
> >> >>
> >> >> Yes, the incorrect assumption is a problem. But with that fixed (in the
> >> >> management application), we run right into the next problem: there is no
> >> >> way for the management application to reliably disconnect the guest from
> >> >> a block device. And that's the problem you're fixing.
> >> >
> >> > Yeah, that's the right way to word it; providing a method to forcibly
> >> > disconnect the guest from the host device.
> >> >>
> >> >> > This series introduces a new montor command to decouple asynchornous
> >> >> > device
> >> >>
> >> >> Typos "montor" and "asynchornous". You might want to use a spell
> >> >> checker :)
> >> >>
> >> >> Lines are a bit long. Recommend wrap at column 70.
> >> >>
> >> >> > removal from restricting guest access to a block device. We do this
> >> >> > by creating
> >> >> > a new monitor command drive_del which maps to a bdrv_unplug() command
> >> >> > which
> >> >> > does a qemu_aio_flush; bdrv_flush() and bdrv_close(). Once complete,
> >> >> > subsequent
> >> >> > IO is rejected from the device and the guest will get IO errors but
> >> >> > continue to
> >> >> > function. In addition to preventing further IO, we clean up state
> >> >> > pointers
> >> >> > between host (BlockDriverState) and guest (DeviceInfo).
> >> >> >
> >> >> > A subsequent device removal command can be issued to remove the
> >> >> > device, to which
> >> >> > the guest may or maynot respond, but as long as the unplugged bit is
> >> >> > set, no IO
> >> >>
> >> >> "maynot" is not a word.
> >> >>
> >> >> > will be sumbitted.
> >> >>
> >> >> This suggests to drive_del before device_del, which makes the device
> >> >> goes through a "broken device" state on its way to unplug. If the guest
> >> >> accesses the device in that state, it gets I/O errors. Not nice.
> >> >>
> >> >> Instead, I'd recommend device_del, wait for the device to go away,
> >> >> drive_del on time out. If the guest reacts to the ACPI unplug promptly,
> >> >> it's never exposed to the "broken device" state. Note: if the drive_del
> >> >> fails because the device doesn't exist, we lost the race with the
> >> >> automatic destruction, which is harmless. Ignore that error.
> >> >
> >> > Honestly, other than describing what happens if you sever the connection
> >> > when the guest isn't aware of it; I don't want to try to capture how the
> >> > mgmt layer implements the removal.
> >> >
> >> > One may want to force the disconnect before attempting to remove the
> >> > device; or the other way around; that's really the mgmt layer's call.
> >>
> >> Fair enough.
> >>
> >> >> > Signed-off-by: Ryan Harper <address@hidden>
> >> >> > ---
> >> >> > block.c | 7 +++++++
> >> >> > block.h | 1 +
> >> >> > blockdev.c | 36 ++++++++++++++++++++++++++++++++++++
> >> >> > blockdev.h | 1 +
> >> >> > hmp-commands.hx | 18 ++++++++++++++++++
> >> >> > 5 files changed, 63 insertions(+), 0 deletions(-)
> >> >> >
> >> >> > diff --git a/block.c b/block.c
> >> >> > index 6b505fb..c76a796 100644
> >> >> > --- a/block.c
> >> >> > +++ b/block.c
> >> >> > @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs,
> >> >> > int removable)
> >> >> > }
> >> >> > }
> >> >> >
> >> >> > +void bdrv_unplug(BlockDriverState *bs)
> >> >> > +{
> >> >> > + qemu_aio_flush();
> >> >> > + bdrv_flush(bs);
> >> >> > + bdrv_close(bs);
> >> >> > +}
> >> >> > +
> >> >>
> >> >> Unless we expect more users, I'd inline this into its only caller.
> >> >> Matter of taste.
> >> >
> >> > Works for me.
> >> >
> >> >>
> >> >> > int bdrv_is_removable(BlockDriverState *bs)
> >> >> > {
> >> >> > return bs->removable;
> >> >> > diff --git a/block.h b/block.h
> >> >> > index 78ecfac..581414c 100644
> >> >> > --- a/block.h
> >> >> > +++ b/block.h
> >> >> > @@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs,
> >> >> > BlockErrorAction on_read_error,
> >> >> > BlockErrorAction on_write_error);
> >> >> > BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int
> >> >> > is_read);
> >> >> > void bdrv_set_removable(BlockDriverState *bs, int removable);
> >> >> > +void bdrv_unplug(BlockDriverState *bs);
> >> >> > int bdrv_is_removable(BlockDriverState *bs);
> >> >> > int bdrv_is_read_only(BlockDriverState *bs);
> >> >> > int bdrv_is_sg(BlockDriverState *bs);
> >> >> > diff --git a/blockdev.c b/blockdev.c
> >> >> > index 6cb179a..ee8c2ec 100644
> >> >> > --- a/blockdev.c
> >> >> > +++ b/blockdev.c
> >> >> > @@ -14,6 +14,8 @@
> >> >> > #include "qemu-option.h"
> >> >> > #include "qemu-config.h"
> >> >> > #include "sysemu.h"
> >> >> > +#include "hw/qdev.h"
> >> >> > +#include "block_int.h"
> >> >> >
> >> >> > static QTAILQ_HEAD(drivelist, DriveInfo) drives =
> >> >> > QTAILQ_HEAD_INITIALIZER(drives);
> >> >> >
> >> >> > @@ -597,3 +599,37 @@ int do_change_block(Monitor *mon, const char
> >> >> > *device,
> >> >> > }
> >> >> > return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
> >> >> > }
> >> >> > +
> >> >> > +int do_drive_del(Monitor *mon, const QDict *qdict, QObject
> >> >> > **ret_data)
> >> >> > +{
> >> >> > + const char *id = qdict_get_str(qdict, "id");
> >> >> > + BlockDriverState *bs;
> >> >> > + Property *prop;
> >> >> > +
> >> >> > + bs = bdrv_find(id);
> >> >> > + if (!bs) {
> >> >> > + qerror_report(QERR_DEVICE_NOT_FOUND, id);
> >> >> > + return -1;
> >> >> > + }
> >> >> > +
> >> >> > + /* quiesce block driver; prevent further io */
> >> >> > + bdrv_unplug(bs);
> >> >> > +
> >> >> > + /* clean up guest state from pointing to host resource by
> >> >> > + * finding and removing DeviceState "drive" property */
> >> >> > + for (prop = bs->peer->info->props; prop && prop->name; prop++) {
> >> >> > + if ((prop->info->type == PROP_TYPE_DRIVE) &&
> >> >> > + (*(BlockDriverState **)qdev_get_prop_ptr(bs->peer, prop)
> >> >> > == bs)) {
> >> >> > + if (prop->info->free) {
> >> >> > + prop->info->free(bs->peer, prop);
> >> >> > + }
> >>
> >> Your use of prop->info->free() in this context is wrong. More below.
> >>
> >> >>
> >> >> Does this null the drive property? I doubt it. Quick check in the
> >> >> debugger?
> >> >>
> >> >> The free callbacks generally don't zap the properties, because they run
> >> >> from qdev_free().
> >> >
> >> > To be honest; I didn't see anything that looked like "remove this
> >> > property" in the qdev api. Any pointers?
> >>
> >> The closest we have is indeed the Property method free(), but that's not
> >> quite right. It's really only for use by qdev_free().
> >>
> >> > should I be calling qdev_free() on the dev?
> >>
> >> No, because then the whole device is gone, not just the property :)
> >>
> >> > I don't quite understand
> >> > the distinction between the info list of properties and the device
> >> > itself, nor specifically what we need to remove in the drive_del()
> >> > operation versus the device_del() portion.
> >>
> >> device_del / qdev_free() destroy a qdev, such as a "virtio-blk-pci"
> >> device (C type VirtIOPCIProxy).
> >>
> >> drive_del destroys something else, namely the block device host part
> >> (BlockDriverState + DeviceInfo). Obviously, it needs to zap all
> >> pointers to the host part along with it. Specifically, it needs to zap
> >> the device's pointer to it.
> >>
> >> Example: if a "virtio-blk-pci" device is using drive "foo", then
> >> "drive_del foo" needs to zap its member block.bs.
> >>
> >> Complication: we don't (want to) know what kind of device exactly is
> >> using the drive. But we do know that a drive property must be
> >> describing it.
> >>
> >> So we search the properties (for (prop...)) for a drive property
> >> (prop->info->type == PROP_TYPE_DRIVE) that points to this drive (... ==
> >> bs).
> >>
> >> Result:
> >>
> >> BlockDriverState *bs;
> >> Property *prop;
> >> BlockDriverState **ptr;
> >> [...]
> >> for (prop = bs->peer->info->props; prop && prop->name; prop++) {
> >> if ((prop->info->type == PROP_TYPE_DRIVE)) {
> >> ptr = qdev_get_prop_ptr(dev, prop);
> >> if (*ptr == bs) {
> >> bdrv_detach(bs, bs->peer);
> >
> > Invoking the free method on the drive property does do detach:
> >
> > free_drive
> > {
> > BlockDriverState **ptr = qdev_get_prop_ptr(dev, prop);
> >
> > if (*ptr) {
> > bdrv_detach(*ptr, dev);
> > blockdev_auto_del(*ptr);
> > }
> > }
> >
> > and the bdrv_delete()
> >
> > takes out the bs pointer.
>
> Which pointer? Which bdrv_delete()?
I suppose it's the BlockDriverState returned from bdrv_find() since I'm
invoking bdrv_delete(bs);
And I suppose qdev_get_prop_ptr() is returning a different ptr to the
same bs; in which case we'll still need the null you had suggested?
>
> >> Only then are we ready to destroy the host part:
> >>
> >> drive_uninit(drive_get_by_blockdev(bs));
> >
> > And if auto-deletion it set, then it handles the drive_uninit(). Do you
> > think
> > we should explicitly invoke drive_uninit() ?
>
> Actually, blockdev_auto_del() deletes the block device only if DriveInfo
> has auto_del set. Why is that? Quote blockdev.c:
>
> /*
> * We automatically delete the drive when a device using it gets
> * unplugged. Questionable feature, but we can't just drop it.
> * Device models call blockdev_mark_auto_del() to schedule the
> * automatic deletion, and generic qdev code calls blockdev_auto_del()
> * when deletion is actually safe.
> */
>
> Thus, you need to blockdev_mark_auto_del() before blockdev_auto_del().
>
> However, my blockdev_add work-in-progress changes these two functions to
> *only* delete block devices created the old way (-drive, drive_add).
> You want them deleted regardless of how they were created. That's why I
> asked you to use drive_uninit() directly.
Gotcha.
>
> You could argue that Property method free() *should* work here. Fair
> point. If you want to clean that up, you're quite welcome. But I don't
> want to burden your fix with that, so feel free to add a suitable
> comment instead.
--
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
address@hidden
- [Qemu-devel] [PATCH 0/2] v6 Decouple block device removal from device removal, Ryan Harper, 2010/11/08
- [Qemu-devel] [PATCH 2/2] Add qmp version of drive_del, Ryan Harper, 2010/11/08
- [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del(), Ryan Harper, 2010/11/08
- Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del(), Markus Armbruster, 2010/11/10
- Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del(), Ryan Harper, 2010/11/10
- Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del(), Markus Armbruster, 2010/11/10
- Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del(), Ryan Harper, 2010/11/10
- Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del(), Ryan Harper, 2010/11/10
- Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del(), Markus Armbruster, 2010/11/11
- Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del(),
Ryan Harper <=
- Re: [Qemu-devel] [PATCH 1/2] Fix Block Hotplug race with drive_del(), Markus Armbruster, 2010/11/11
[Qemu-devel] Re: [PATCH 0/2] v6 Decouple block device removal from device removal, Michael S. Tsirkin, 2010/11/09