|
From: | Hanna Czenczek |
Subject: | Re: [PATCH] virtio: Always reset vhost devices |
Date: | Thu, 11 Jul 2024 13:43:55 +0200 |
User-agent: | Mozilla Thunderbird |
On 10.07.24 15:39, Matias Ezequiel Vara Larsen wrote:
Hello Hanna, On Wed, Jul 10, 2024 at 01:23:10PM +0200, Hanna Czenczek wrote:Requiring `vhost_started` to be true for resetting vhost devices in `virtio_reset()` seems like the wrong condition: Most importantly, the preceding `virtio_set_status(vdev, 0)` call will (for vhost devices) end up in `vhost_dev_stop()` (through vhost devices' `.set_status` implementations), setting `vdev->vhost_started = false`. Therefore, the gated `vhost_reset_device()` call is unreachable. `vhost_started` is not documented, so it is hard to say what exactly it is supposed to mean, but judging from the fact that `vhost_dev_start()` sets it and `vhost_dev_stop()` clears it, it seems like it indicates whether there is a vhost back-end, and whether that back-end is currently running and processing virtio requests. Making a reset conditional on whether the vhost back-end is processing virtio requests seems wrong; in fact, it is probably better to reset it only when it is not currently processing requests, which is exactly the current order of operations in `virtio_reset()`: First, the back-end is stopped through `virtio_set_status(vdev, 0)`, then we want to send a reset. Therefore, we should drop the `vhost_started` condition, but in its stead we then have to verify that we can indeed send a reset to this vhost device, by not just checking `k->get_vhost != NULL` (introduced by commit 95e1019a4a9), but also that the vhost back-end is connected (`hdev = k->get_vhost(); hdev != NULL && hdev->vhost_ops != NULL`).I am not a native speaker but I think the commit message could be rephrased to make it clear. This is a minor comment so feel free to ignore it:
Thanks! I don’t mind restructuring, I just wanted to be verbose on what’s happening. Honestly, I’m also not 100 % sure about my reasoning, which is why I put a lot of indefinite fluff in there, i.e. purposefully was not clear.
I wonder what exactly you find to be the problem with it, though? Too much fluff, too much detail, weird sentence structure?
Before `virtio_reset()` is invoked, `virtio_set_status(vdev, 0)` for vhost devices ends up setting `vhost_stared` to false thus resulting in `vhost_reset_device()` being never reached during a reset.
It’s before vhost_reset_device(), not virtio_reset() (all of this is in virtio_reset()). Also, the “`virtio_set_status()` for vhost devices” sounds a bit strange and too detail-less to me; (A) it makes it sound a bit like it’d be only called for vhost devices (easily solvable by moving “for vhost devices” back after ”false”), and (B) I would like to have the detail in there how vhost_started is cleared by the set_status call, because it doesn’t seem obvious to me. Also, I find the “never reached during a reset” strangely abstract; we can be very concrete here, so why not be?
MaybeWhen a vhost device is reset, we want `virtio_reset()` to call `vhost_reset_device()`. However, `virtio_reset()` begins with `virtio_set_status(vdev, 0)`, which, for vhost devices, will end up in `vhost_dev_stop()`, clearing `vhost_started`. This makes the subsequent `vhost_reset_device()` call unreachable.
`vhost_started` is not documented, however, from `vhost_dev_start()` and `vhost_dev_stop()`, it seems indicating that there is a vhost back-end and that the back-end is running and processing virtio requests. Resetting should not rely on whether the vhost back-end is processing virtio requests, instead, reset must happen if there is no processing request. This is what `virtio_reset()` does, it first stops the backend and then sends the reset.
I find this unclear on how vhost_dev_start()/stop() indicate what vhost_started means (it’s because they set/clear it). I cannot stand behind “reset must happen if there is no processing request”, because I don’t know for sure. I just feel like it makes more sense this way, hence me saying “probably”. I miss context to “it stops the backend”, I would like to be clear that it’s through virtio_set_status(0) that the back-end is stopped. I find it important to be concrete here because I asked myself whether e could pull vhost_reset_device() up before virtio_set_status(0), and this paragraph is intended to convey that that would be wrong; I want to be clear that the concrete call order in that function makes sense to me.
I also miss a connection to why this paragraph is focussing on vhost_started now. Maybe I’m a bit slow at making connections, but in this rephrased version, it is not 100 % clear that vhost_started is a condition for vhost_reset_device() (unless one looks at the code simultaneously), so its importance and why we need to know what it does isn’t clear here. The very start of the original message however did stress that the vhost_started condition seems wrong, so it’s clear why it’s important at this point.
So maybeRequiring `vhost_started` as a precondition for `vhost_reset_device()` is also semantically questionable: While undocumented, it being set/cleared by `vhost_dev_{start,stop}()` indicates that it shows whether there is a vhost back-end, and whether that back-end is running and processing virtio requests. The latter should not be a condition for doing a reset, on the contrary: A reset should preferably happen when the back-end is stopped. Thus, the order that we have in `virtio_reset()` (stop the back-end via `virtio_set_status(vdev, 0)`, then reset it) makes perfect sense, but checking `vhost_started` does not.
To trigger a reset, this commit drops the `vhost_started` condition and checks that the device is running (introduced by 95e1019a4a9) but also that the vhost back-end is connected so it is possible to send a reset to the vhost device.
This seems incorrect: First, we decidedly no longer check that the device is running, that was the point of the previous paragraph. Second, no such check has been introduced by commit 95e1019a4a9. That commit just introduced a check of `k->get_vhost != NULL`, i.e. whether this could even theoretically be a vhost device.
So I’d just keep the final paragraph as-is from the original patch. It isn’t much longer, just quotes more code.
Hanna
[Prev in Thread] | Current Thread | [Next in Thread] |