Re: [Qemu-devel] [PATCH 1/3] virtio: Introduce VirtIODevice.broken

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 1/3] virtio: Introduce VirtIODevice.broken

From:	Michael S. Tsirkin
Subject:	Re: [Qemu-devel] [PATCH 1/3] virtio: Introduce VirtIODevice.broken
Date:	Mon, 28 Apr 2014 05:59:41 +0300

On Mon, Apr 28, 2014 at 09:58:53AM +0800, Fam Zheng wrote:
> On Sun, 04/27 11:59, Michael S. Tsirkin wrote:
> > On Sun, Apr 27, 2014 at 09:34:06AM +0100, Peter Maydell wrote:
> > > On 27 April 2014 09:29, Michael S. Tsirkin <address@hidden> wrote:
> > > > On Tue, Apr 22, 2014 at 04:55:15PM +0800, Fam Zheng wrote:
> > > >> If guest driver behaves abnormally, emulation code could mark the 
> > > >> device
> > > >> as "broken".
> > > >>
> > > >> Once "broken" is set, device emulation will typically wait for a reset
> > > >> command and ignore any other operations, but it could also return error
> > > >> responds. In other words, whether and how does guest know about this
> > > >> error status is device specific.
> > > >>
> > > >> Signed-off-by: Fam Zheng <address@hidden>
> > > >
> > > > I'm assuming the idea is to make debugging guest drivers easier
> > > > for people not familiar with qemu?
> > > 
> > > As a general rule, guests shouldn't be able to cause QEMU to
> > > just randomly exit. We have a bunch of code in tree which does
> > > handle guest errors this way, of course, but cleanups to fix it
> > > are worth having.
> > 
> > OK so by using a wrong address an MMIO handler can e.g. start
> > MMIO on the device itself instead of doing DMA,
> > this will cause an infinite loop.
> > Any idea how to fix this?
> > 
> > > The benefits include that one duff device
> > > driver doesn't take out your whole VM, that you have a chance
> > > for a clean shutdown, and reboot might restore the operation of
> > > the offending device.
> > > 
> > > There was a thread about this a little while back.
> > > 
> > > thanks
> > > -- PMM
> > 
> > I agree, but I'd like the change to be done in a way that does not make
> > debugging harder.
> > 
> 
> Michael, Why is debugging harder with this patch, please?
> 
> Fam

Several points:
        1.  guest will keep going and modifying the guest memory
            potentially erasing error

                Several ideas (maybe implement them all):
                        - add options to exit qemu/pause guest/try to ignore 
error/cause guest to panic
                                disk already has options like this, correct
                        - pass debugging string to e.g. pvpanic device (if 
present), and make it
                          possible for guest to retrieve it
                        - extend virtio spec:
                          send config interrupt to driver, make it possible for 
driver to retrieve error

        2.  after migration, device will forget the "broken" status
            and will restart modifying guest memory


        3. A separate issue:
                It seems that there's a security issue: a malicious
                guest can trigger this error_report, reset device and
                trigger it again, flooding the monitor with errors.

                        Idea:
                        disable this message after first error,
                        simply incrementing some counter, and adding a new
                        command to retrieve counters and re-enable?


All in all, it seems like a lot of work.


But I think I see a nice elegant solution : stop the VM when
some device detects an internal error and becomes broken.
This should be a general event telling management that this
happened - similar to pvpanic event. Management will add
options to restart guest/reset guest/report to user/dump memory.

How about also dumping queue state as part of event in case it gets
modified?

Also, as long as you are adding this, could you add
handling for vhost errors too? vhost supports specifying an
error fd which it can signal on error, but that is
currently unused.


Thanks!

-- 
MST

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [PATCH 1/3] virtio: Introduce VirtIODevice.broken, (continued)
- [Qemu-devel] [PATCH 3/3] virtio-scsi: Don't exit on bad request, Fam Zheng, 2014/04/22
- [Qemu-devel] [PATCH 2/3] virtio-blk: Don't exit on invalid VQ data, Fam Zheng, 2014/04/22
  - Re: [Qemu-devel] [PATCH 2/3] virtio-blk: Don't exit on invalid VQ data, Jason Wang, 2014/04/22
    - Re: [Qemu-devel] [PATCH 2/3] virtio-blk: Don't exit on invalid VQ data, Fam Zheng, 2014/04/22
- Re: [Qemu-devel] [PATCH 0/3] virtio: Eliminate "exit(1)" upon invalid request in virtio-blk and virtio-scsi, Michael S. Tsirkin, 2014/04/22
  - Re: [Qemu-devel] [PATCH 0/3] virtio: Eliminate "exit(1)" upon invalid request in virtio-blk and virtio-scsi, Fam Zheng, 2014/04/22
    - Re: [Qemu-devel] [PATCH 0/3] virtio: Eliminate "exit(1)" upon invalid request in virtio-blk and virtio-scsi, Michael S. Tsirkin, 2014/04/23
    - Re: [Qemu-devel] [PATCH 0/3] virtio: Eliminate "exit(1)" upon invalid request in virtio-blk and virtio-scsi, Markus Armbruster, 2014/04/24
    - Re: [Qemu-devel] [PATCH 0/3] virtio: Eliminate "exit(1)" upon invalid request in virtio-blk and virtio-scsi, Michael S. Tsirkin, 2014/04/25

Prev by Date: [Qemu-devel] [PATCH v2] mirror: Check for bdrv_get_info result
Next by Date: [Qemu-devel] [PATCH] mirror: Use DIV_ROUND_UP
Previous by thread: Re: [Qemu-devel] [PATCH 1/3] virtio: Introduce VirtIODevice.broken
Next by thread: Re: [Qemu-devel] [PATCH 1/3] virtio: Introduce VirtIODevice.broken
Index(es):
- Date
- Thread