[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 00/18] virtio-blk: Support "VIRTIO_CONFIG_S_NEED
From: |
Fam Zheng |
Subject: |
Re: [Qemu-devel] [PATCH 00/18] virtio-blk: Support "VIRTIO_CONFIG_S_NEEDS_RESET" |
Date: |
Tue, 21 Apr 2015 10:37:00 +0800 |
User-agent: |
Mutt/1.5.23 (2014-03-12) |
On Mon, 04/20 19:36, Michael S. Tsirkin wrote:
> On Fri, Apr 17, 2015 at 03:59:15PM +0800, Fam Zheng wrote:
> > Currently, virtio code chooses to kill QEMU if the guest passes any invalid
> > data with vring.
> > That has drawbacks such as losing unsaved data (e.g. when
> > guest user is writing a very long email), or possible denial of service in
> > a nested vm use case where virtio device is passed through.
> >
> > virtio-1 has introduced a new status bit "NEEDS RESET" which could be used
> > to
> > improve this by communicating the error state between virtio devices and
> > drivers. The device notifies guest upon setting the bit, then the guest
> > driver
> > should detect this bit and report to userspace, or recover the device by
> > resetting it.
>
> Unfortunately, virtio 1 spec does not have a conformance statement
> that requires driver to recover. We merely have a non-normative looking
> text:
> Note: For example, the driver can’t assume requests in flight
> will be completed if DEVICE_NEEDS_RESET is set, nor can it assume that
> they have not been completed. A good implementation will try to recover
> by issuing a reset.
>
> Implementing this reset for all devices in a race-free manner might also
> be far from trivial. I think we'd need a feature bit for this.
> OTOH as long as we make this a new feature, would an ability to
> reset a single VQ be a better match for what you are trying to
> achieve?
I think that is too complicated as a recovery measure, a device level resetting
will be better to get to a deterministic state, at least.
>
> > This series makes necessary changes in virtio core code, based on which
> > virtio-blk is converted. Other devices now keep the existing behavior by
> > passing in "error_abort". They will be converted in following series. The
> > Linux
> > driver part will also be worked on.
> >
> > One concern with this behavior change is that it's now harder to notice the
> > actual driver bug that caused the error, as the guest continues to run. To
> > address that, we could probably add a new error action option to virtio
> > devices, similar to the "read/write werror" in block layer, so the vm
> > could be
> > paused and the management will get an event in QMP like pvpanic. This work
> > can
> > be done on top.
>
> At the architectural level, that's only one concern. Others would be
> - workloads such as openstack handle guest crash better than
> a guest that's e.g. slow because of a memory leak
What memory leak are you referring to?
> - it's easier for guests to probe host for security issues
> if guest isn't killed
> - guest can flood host log with guest-triggered errors
We can still abort() if guest is triggering error too quickly.
Fam
- Re: [Qemu-devel] [PATCH 00/18] virtio-blk: Support "VIRTIO_CONFIG_S_NEEDS_RESET", (continued)
Re: [Qemu-devel] [PATCH 00/18] virtio-blk: Support "VIRTIO_CONFIG_S_NEEDS_RESET", Michael S. Tsirkin, 2015/04/20
Re: [Qemu-devel] [PATCH 00/18] virtio-blk: Support "VIRTIO_CONFIG_S_NEEDS_RESET",
Fam Zheng <=