[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v3 0/7] Fix scsi devices plug/unplug races w.r.t virtio-scsi
From: |
Maxim Levitsky |
Subject: |
Re: [PATCH v3 0/7] Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread |
Date: |
Thu, 13 Aug 2020 15:26:21 +0300 |
User-agent: |
Evolution 3.36.3 (3.36.3-1.fc32) |
On Wed, 2020-07-15 at 18:01 +0300, Maxim Levitsky wrote:
> Hi!
>
> This is a patch series that is a result of my discussion with Paulo on
> how to correctly fix the root cause of the BZ #1812399.
>
> The root cause of this bug is the fact that IO thread is running mostly
> unlocked versus main thread on which device hotplug is done.
>
> qdev_device_add first creates the device object, then places it on the bus,
> and only then realizes it.
>
> However some drivers and currently only virtio-scsi enumerate its child bus
> devices on each request that is received from the guest and that can happen
> on the IO
> thread.
>
> Thus we have a window when new device is on the bus but not realized and can
> be accessed
> by the virtio-scsi driver in that state.
>
> Fix that by doing two things:
>
> 1. Add partial RCU protection to the list of a bus's child devices.
> This allows the scsi IO thread to safely enumerate the child devices
> while it races with the hotplug placing the device on the bus.
>
> 2. Make the virtio-scsi driver check .realized property of the scsi device
> and avoid touching the device if it isn't
>
> Note that in the particular bug report the issue wasn't a race but rather due
> to combination of things, the .realize code in the middle managed to trigger
> IO on the virtqueue
> which caused the virtio-scsi driver to access the half realized device.
> However
> since this can happen as well with real IO thread, this patch series was done,
> which fixes this as well.
>
> Changes from V1:
> * Patch 2 is new, as suggested by Stefan, added drain_call_rcu() to fix the
> failing unit test,
> make check pass now
>
> * Patches 6,7 are new as well: I added scsi_device_get as suggested by
> Stefan as well, although
> this is more a refactoring that anything else as it doesn't solve
> an existing race.
>
> * Addressed most of the review feedback from V1
> - still need to decide if we need QTAILQ_FOREACH_WITH_RCU_READ_LOCK
>
> Changes from V2:
>
> * No longer RFC
> * Addressed most of the feedback from Stefan
> * Fixed reference count leak in patch 7 when device is about to be
> unrealized
> * Better testing
>
> This series was tested by adding a virtio-scsi drive with iothread,
> then running fio stress job in the guest in a loop, and then adding/removing
> the scsi drive on the host in the loop.
> This test was failing usually on 1st iteration withouth this patch series,
> and now it seems to work smoothly.
>
> Best regards,
> Maxim Levitsky
>
> Maxim Levitsky (7):
> scsi/scsi_bus: switch search direction in scsi_device_find
> Implement drain_call_rcu and use it in hmp_device_del
> device-core: use RCU for list of childs of a bus
> device-core: use atomic_set on .realized property
> virtio-scsi: don't touch scsi devices that are not yet realized or
> about to be un-realized
> scsi: Add scsi_device_get
> virtio-scsi: use scsi_device_get
>
> hw/core/bus.c | 28 +++++++++++++--------
> hw/core/qdev.c | 56 +++++++++++++++++++++++++++++++-----------
> hw/scsi/scsi-bus.c | 48 +++++++++++++++++++++++++++++++-----
> hw/scsi/virtio-scsi.c | 47 ++++++++++++++++++++++++++++-------
> include/hw/qdev-core.h | 11 +++++++++
> include/hw/scsi/scsi.h | 2 ++
> include/qemu/rcu.h | 1 +
> qdev-monitor.c | 22 +++++++++++++++++
> util/rcu.c | 55 +++++++++++++++++++++++++++++++++++++++++
> 9 files changed, 230 insertions(+), 40 deletions(-)
>
> --
> 2.26.2
>
Seems that this patch series doesn't have any disagreeements, so
anybody wants to put it on an maintainer's tree, now that the freeze is over?
Best regards,
Maxim Levitsky