2024年5月15日 23:47,Raphael Norwitz <raphael@enfabrica.net> 写道:
The case your describing makes sense but now I have some concerns onthe vhost_dev_cleanup bit.On Wed, May 15, 2024 at 1:47 AM Li Feng <fengli@smartx.com> wrote:
2024年5月14日 21:58,Raphael Norwitz <raphael@enfabrica.net> 写道:
Code looks good. Just a question on the error case you're trying to fix.
On Tue, May 14, 2024 at 2:12 AM Li Feng <fengli@smartx.com> wrote:
When the vhost-user is reconnecting to the backend, and if the vhost-user fails at the get_features in vhost_dev_init(), then the reconnect will fail and it will not be retriggered forever.
The reason is: When the vhost-user fail at get_features, the vhost_dev_cleanup will be called immediately.
vhost_dev_cleanup calls 'memset(hdev, 0, sizeof(struct vhost_dev))'.
The reconnect path is: vhost_user_blk_event vhost_user_async_close(.. vhost_user_blk_disconnect ..) qemu_chr_fe_set_handlers <----- clear the notifier callback schedule vhost_user_async_close_bh
The vhost->vdev is null, so the vhost_user_blk_disconnect will not be called, then the event fd callback will not be reinstalled.
With this patch, the vhost_user_blk_disconnect will call the vhost_dev_cleanup() again, it's safe.
In addition, the CLOSE event may occur in a scenario where connected is false. At this time, the event handler will be cleared. We need to ensure that the event handler can remain installed.
Following on from the prior patch, why would "connected" be false when a CLOSE event happens?
In OPEN event handling, vhost_user_blk_connect calls vhost_dev_init and encounters an error such that s->connected remains false. Next, after the CLOSE event arrives, it is found that s->connected is false, so nothing is done, but the event handler will be cleaned up in `vhost_user_async_close` before the CLOSE event is executed.
Got it - I see why the event handler is never re-installed in the codeas it was before if we fail at get_features. That said, how do youexplain your comment:
OK, I will update the commit message because this code has changed some months ago. With this patch, the vhost_user_blk_disconnect will call the vhost_dev_cleanup() again, it's safe.
I see vhost_dev_cleanup() accessing hdev without even a NULL check. Inthe case we're talking about here I don't think it's a problem becauseif vhost_dev_init() fails, connected will be false and hit the gotobut I am concerned that there could be double-frees or use-after-freesin other cases.
OK, you are right, with this patch, the vhost_dev_cleanup will not be called multiple times now.
I think there is no need to worry about calling vhost_dev_cleanup multiple times, because historically vhost_dev_cleanup has been allowed to be called multiple times, and looking at the code, it can be found that calling vhost_dev_cleanup multiple times is indeed safe.
Look this patch:
commit e0547b59dc0ead4c605d3f02d1c8829630a1311b Author: Marc-André Lureau <marcandre.lureau@redhat.com> Date: Wed Jul 27 01:15:02 2016 +0400
vhost: make vhost_dev_cleanup() idempotent
It is called on multiple code path, so make it safe to call several times (note: I don't remember a reproducer here, but a function called 'cleanup' should probably be idempotent in my book)
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Thanks, Li
Thanks, Li
All vhost-user devices have this issue, including vhost-user-blk/scsi.
Fixes: 71e076a07d ("hw/virtio: generalise CHR_EVENT_CLOSED handling")
Signed-off-by: Li Feng <fengli@smartx.com> --- hw/block/vhost-user-blk.c | 3 ++- hw/scsi/vhost-user-scsi.c | 3 ++- hw/virtio/vhost-user-base.c | 3 ++- hw/virtio/vhost-user.c | 10 +--------- 4 files changed, 7 insertions(+), 12 deletions(-)
diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c index 41d1ac3a5a..c6842ced48 100644 --- a/hw/block/vhost-user-blk.c +++ b/hw/block/vhost-user-blk.c @@ -353,7 +353,7 @@ static void vhost_user_blk_disconnect(DeviceState *dev) VHostUserBlk *s = VHOST_USER_BLK(vdev);
if (!s->connected) { - return; + goto done; } s->connected = false;
@@ -361,6 +361,7 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
vhost_dev_cleanup(&s->dev);
+done: /* Re-instate the event handler for new connections */ qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, vhost_user_blk_event, NULL, dev, NULL, true); diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c index 48a59e020e..b49a11d23b 100644 --- a/hw/scsi/vhost-user-scsi.c +++ b/hw/scsi/vhost-user-scsi.c @@ -181,7 +181,7 @@ static void vhost_user_scsi_disconnect(DeviceState *dev) VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
if (!s->connected) { - return; + goto done; } s->connected = false;
@@ -189,6 +189,7 @@ static void vhost_user_scsi_disconnect(DeviceState *dev)
vhost_dev_cleanup(&vsc->dev);
+done: /* Re-instate the event handler for new connections */ qemu_chr_fe_set_handlers(&vs->conf.chardev, NULL, NULL, vhost_user_scsi_event, NULL, dev, NULL, true); diff --git a/hw/virtio/vhost-user-base.c b/hw/virtio/vhost-user-base.c index 4b54255682..11e72b1e3b 100644 --- a/hw/virtio/vhost-user-base.c +++ b/hw/virtio/vhost-user-base.c @@ -225,13 +225,14 @@ static void vub_disconnect(DeviceState *dev) VHostUserBase *vub = VHOST_USER_BASE(vdev);
if (!vub->connected) { - return; + goto done; } vub->connected = false;
vub_stop(vdev); vhost_dev_cleanup(&vub->vhost_dev);
+done: /* Re-instate the event handler for new connections */ qemu_chr_fe_set_handlers(&vub->chardev, NULL, NULL, vub_event, diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index c929097e87..c407ea8939 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -2781,16 +2781,8 @@ typedef struct { static void vhost_user_async_close_bh(void *opaque) { VhostAsyncCallback *data = "">- struct vhost_dev *vhost = data->vhost;
- /* - * If the vhost_dev has been cleared in the meantime there is - * nothing left to do as some other path has completed the - * cleanup. - */ - if (vhost->vdev) { - data->cb(data->dev); - } + data->cb(data->dev);
g_free(data); } -- 2.45.0
|