qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] QEMU terminates during reboot after memory unplug with


From: Alex Williamson
Subject: Re: [Qemu-devel] QEMU terminates during reboot after memory unplug with vhost=on
Date: Thu, 16 Nov 2017 09:04:48 -0700

On Thu, 16 Nov 2017 17:59:17 +0200
"Michael S. Tsirkin" <address@hidden> wrote:

> On Thu, Nov 16, 2017 at 08:53:45AM -0700, Alex Williamson wrote:
> > On Thu, 16 Nov 2017 17:28:44 +0200
> > "Michael S. Tsirkin" <address@hidden> wrote:
> >   
> > > On Thu, Nov 16, 2017 at 01:23:39PM +0100, Greg Kurz wrote:  
> > > > Hi,
> > > > 
> > > > I'm resurrecting a thread about a QEMU crash we're still hitting on 
> > > > ppc64. It
> > > > was reported to the list by Bharata 2 months ago:
> > > > 
> > > > https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg03685.html
> > > > 
> > > > "Hi,
> > > > 
> > > > QEMU hits the below assert
> > > > 
> > > > qemu-system-ppc64: used ring relocated for ring 2
> > > > qemu-system-ppc64: qemu/hw/virtio/vhost.c:649: vhost_commit: Assertion 
> > > > `r >= 0' failed.
> > > > 
> > > > in the following scenario:
> > > > 
> > > > 1. Boot guest with vhost=on
> > > >   -netdev 
> > > > tap,id=mynet0,script=qemu-ifup,downscript=qemu-ifdown,vhost=on -device 
> > > > virtio-net-pci,netdev=mynet0
> > > > 2. Hot add a DIMM device 
> > > > 3. Reboot
> > > >    When the guest reboots, we can see
> > > >    vhost_virtqueue_start:vq->used_phys getting assigned an address that
> > > >    falls in the hotplugged memory range.
> > > > 4. Remove the DIMM device
> > > >    Guest refuses the removal as the hotplugged memory is under use.
> > > > 5. Reboot
> > > >    QEMU forces the removal of the DIMM device during reset and that's
> > > >    when we hit the above assert.
> > > > 
> > > > Any pointers on why we are hitting this assert ? Shouldn't vhost be
> > > > done with using the hotplugged memory when we hit reset ?
> > > > 
> > > > Regards,
> > > > Bharata."
> > > > 
> > > > #0  0x00007ffff760eff0 in raise () from /lib64/libc.so.6
> > > > #1  0x00007ffff761136c in abort () from /lib64/libc.so.6
> > > > #2  0x00007ffff7604c44 in __assert_fail_base () from /lib64/libc.so.6
> > > > #3  0x00007ffff7604d34 in __assert_fail () from /lib64/libc.so.6
> > > > #4  0x0000000010161138 in vhost_commit (listener=0x11469e88) at 
> > > > /home/greg/Work/qemu/qemu-spapr/hw/virtio/vhost.c:650
> > > > #5  0x00000000100917fc in memory_region_transaction_commit () at 
> > > > /home/greg/Work/qemu/qemu-spapr/memory.c:1094
> > > > #6  0x0000000010096748 in memory_region_del_subregion (mr=0x1143eed0, 
> > > > subregion=0x116f1920) at /home/greg/Work/qemu/qemu-spapr/memory.c:2337
> > > > #7  0x00000000104a9aec in pc_dimm_memory_unplug (dev=0x11445c50, 
> > > > hpms=0x1143eec0, mr=0x116f1920) at 
> > > > /home/greg/Work/qemu/qemu-spapr/hw/mem/pc-dimm.c:126
> > > > #8  0x0000000010180454 in spapr_lmb_release (dev=0x11445c50) at 
> > > > /home/greg/Work/qemu/qemu-spapr/hw/ppc/spapr.c:3151
> > > > #9  0x00000000101a397c in spapr_drc_release (drc=0x116b9cc0) at 
> > > > /home/greg/Work/qemu/qemu-spapr/hw/ppc/spapr_drc.c:401
> > > > #10 0x00000000101a3ba0 in spapr_drc_reset (drc=0x116b9cc0) at 
> > > > /home/greg/Work/qemu/qemu-spapr/hw/ppc/spapr_drc.c:439
> > > > #11 0x00000000101a3c88 in drc_reset (opaque=0x116b9cc0) at 
> > > > /home/greg/Work/qemu/qemu-spapr/hw/ppc/spapr_drc.c:460
> > > > #12 0x0000000010447380 in qemu_devices_reset () at 
> > > > /home/greg/Work/qemu/qemu-spapr/hw/core/reset.c:69
> > > > #13 0x000000001017ae80 in ppc_spapr_reset () at 
> > > > /home/greg/Work/qemu/qemu-spapr/hw/ppc/spapr.c:1445
> > > > #14 0x0000000010377c60 in qemu_system_reset 
> > > > (reason=SHUTDOWN_CAUSE_HOST_QMP) at 
> > > > /home/greg/Work/qemu/qemu-spapr/vl.c:1788
> > > > #15 0x00000000103785ac in main_loop_should_exit () at 
> > > > /home/greg/Work/qemu/qemu-spapr/vl.c:1962
> > > > #16 0x0000000010378708 in main_loop () at 
> > > > /home/greg/Work/qemu/qemu-spapr/vl.c:1999
> > > > #17 0x0000000010382c54 in main (argc=21, argv=0x7ffffffff098, 
> > > > envp=0x7ffffffff148) at /home/greg/Work/qemu/qemu-spapr/vl.c:4897
> > > > 
> > > > 
> > > > This basically happens because on pseries, like x86, we usually wait
> > > > for the guest to eject the DIMM before actually removing it, BUT,
> > > > unlike x86, we force the removal on reset. This is handled by a DRC
> > > > object which registers a handler with qemu_register_reset().
> > > > 
> > > > At reset time, the machine calls qemu_devices_reset() but unfortunately,
> > > > the DRC reset handler gets called BEFORE the VirtIONet device one. The
> > > > vhost device is still active and it doesn't like the ring addresses to
> > > > change while in this state.
> > > > 
> > > > Michael,
> > > > 
> > > > The assert() has been around since the beginning, at a time I believe 
> > > > there was
> > > > no such thing as memory hot-unplug. Now that memory can go away at 
> > > > reset time,
> > > > is it really legitimate to crash QEMU if vhost detects a ring address 
> > > > change ?    
> > > 
> > > It's just a symptom of a problem though. If memory is going away
> > > while vhost backend is running, things are not going to
> > > end well. Less scary for a network device, more scary for a block
> > > device. VFIO probably has the same issue, it just does not
> > > have an assert.  
> > 
> > Hmm, why?  We don't have data structures living in guest RAM with
> > vfio.  Guest RAM is just a mapping through the IOMMU.  So long as the
> > MemoryListener is correctly doing its job, that range will be unmapped
> > and vfio shouldn't care about it.  Thanks,
> > 
> > Alex  
> 
> Range is unmapped by listener but device has not been reset yet,
> so it will get errors when attempting to access guest RAM.

So IOMMU faults would occur if the device continues to try to DMA
into that range.  Sloppy ordering, but vfio wouldn't have any way to
tell the difference between this and a runtime RAM unplug.  Thanks,

Alex



reply via email to

[Prev in Thread] Current Thread [Next in Thread]