[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v9 05/11] vfio: add check host bus reset is supp
From: |
Alex Williamson |
Subject: |
Re: [Qemu-devel] [PATCH v9 05/11] vfio: add check host bus reset is support or not |
Date: |
Wed, 31 Aug 2016 20:12:42 -0600 |
On Wed, 31 Aug 2016 13:56:20 -0600
Alex Williamson <address@hidden> wrote:
> On Tue, 19 Jul 2016 15:38:23 +0800
> Zhou Jie <address@hidden> wrote:
>
> > From: Chen Fan <address@hidden>
> >
> > When assigning a vfio device with AER enabled, we must check whether
> > the device supports a host bus reset (ie. hot reset) as this may be
> > used by the guest OS in order to recover the device from an AER
> > error. QEMU must therefore have the ability to perform a physical
> > host bus reset using the existing vfio APIs in response to a virtual
> > bus reset in the VM. A physical bus reset affects all of the devices
> > on the host bus, therefore we place a few simplifying configuration
> > restriction on the VM:
> >
> > - All physical devices affected by a bus reset must be assigned to
> > the VM with AER enabled on each and be configured on the same
> > virtual bus in the VM.
> >
> > - No devices unaffected by the bus reset, be they physical, emulated,
> > or paravirtual may be configured on the same virtual bus as a
> > device supporting AER signaling through vfio.
> >
> > In other words users wishing to enable AER on a multifunction device
> > need to assign all functions of the device to the same virtual bus
> > and enable AER support for each device. The easiest way to
> > accomplish this is to identity map the physical functions to virtual
> > functions with multifunction enabled on the virtual device.
>
> Why am I able to start the following VM with aer=on for the vfio-pci
> devices?
>
> # lspci -tv
> -[0000:00]-+-00.0 Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
> +-01.0 Device 1234:1111
> +-1c.0-[01]--
> +-1d.0-[02]--+-01.0 Intel Corporation 82576 Gigabit Network
> Connection
> | \-01.1 Intel Corporation 82576 Gigabit Network
> Connection
> ...
>
> # lspci -vvv -s 1d.0
> 00:1d.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge (prog-if 00 [Normal
> decode])
>
> The devices are behind a PCIe-to-PCI bridge, so shouldn't specifying
> aer=on for the vfio-pci devices cause a configuration error?
>
> commandline:
>
> /home/alwillia/local/bin/qemu-system-x86_64 -name
> guest=rhel7-q35,debug-threads=on -S -object
> secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-11-rhel7-q35/master-key.aes
> -machine pc-q35-2.7,accel=kvm,usb=off,vmport=off -cpu IvyBridge -m 8192
> -realtime mlock=off -smp 6,sockets=1,cores=6,threads=1 -uuid
> b20b28b4-9304-4e11-9ffa-0367aeb44afb -no-user-config -nodefaults -chardev
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-11-rhel7-q35/monitor.sock,server,nowait
> -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew
> -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global
> ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 -boot strict=on -device
> i82801b11-bridge,id=pci.1,bus=pcie.0,addr=0x1e -device
> pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x1 -device
> pci-bridge,chassis_nr=3,id=pci.3,bus=pcie.0,addr=0x1d -device
> ioh3420,port=0xe0,chassis=4,id=pci.4,bus=pcie.0,addr=0x1c -device
> ich9-usb-ehci1,id=usb,bus=pci!
.2,addr=0x3.0x7 -device
ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.2,multifunction=on,addr=0x3
-device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.2,addr=0x3.0x1
-device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.2,addr=0x3.0x2
-drive
file=/dev/rhel/rhel7-q35,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native
-device
virtio-blk-pci,scsi=off,bus=pci.2,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
-netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=28 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:50:ec:0d,bus=pci.2,addr=0x1
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0
-device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 127.0.0.1:0 -device
VGA,id=video0,vgamem_mb=16,bus=pcie.0,addr=0x1 -device
intel-hda,id=sound0,bus=pci.2,addr=0x2 -device
hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device
vfio-pci,aer=on,host=07:00.0,id=hostdev0,bus=pci.3,multifunction=on,addr=0x1
-device vfio-pci,!
aer=on,host=07:00.1,id=hostdev1,bus=pci.3,addr=0x1.0x1 -msg timestamp=on
>
I had to move to a different system where I could actually inject an
aer error and created a config similar to above but with the 82576
ports downstream of the ioh3420 root port. When I inject a malformed
TLP uncorrectable error, my RHEL7.2 guest does this:
[ 35.995645] pcieport 0000:00:1c.0: AER: Multiple Uncorrected (Fatal) error
received: id=0200
[ 35.998483] igb 0000:02:00.0: PCIe Bus Error: severity=Uncorrected (Fatal),
type=Unaccessible, id=0200(Unregistered Agent ID)
[ 36.001965] igb 0000:02:00.0 enp2s0f0: PCIe link lost, device now detached
[ 36.015092] igb 0000:02:00.1 enp2s0f1: PCIe link lost, device now detached
[ 39.133185] igb 0000:02:00.0: enabling device (0000 -> 0002)
[ 40.071245] igb 0000:02:00.1: enabling device (0000 -> 0002)
[ 41.014451] BUG: unable to handle kernel paging request at 0000000000003818
[ 41.015969] IP: [<ffffffffa02b438d>] igb_configure_tx_ring+0x14d/0x280 [igb]
[ 41.017507] PGD 367e2067 PUD 7ae56067 PMD 0
[ 41.018497] Oops: 0002 [#1] SMP
[ 41.019242] Modules linked in: ip6t_rpfilter ip6t_REJECT ipt_REJECT
xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle
ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle
iptable_security iptable_raw iptable_filter snd_hda_codec_generic snd_hda_intel
snd_hda_codec ppdev snd_hda_core snd_hwdep snd_seq snd_seq_device iTCO_wdt
iTCO_vendor_support bochs_drm snd_pcm syscopyarea sysfillrect sysimgblt ttm
virtio_balloon snd_timer snd igb drm_kms_helper soundcore ptp pps_core
i2c_algo_bit i2c_i801 dca drm shpchp lpc_ich mfd_core pcspkr i2c_core
parport_pc parport ip_tables xfs libcrc32c virtio_blk virtio_console virtio_net
ahci libahci crc32c_intel serio_raw libata virtio_pci virtio_ring virtio
dm_mirror dm_region_hash dm_log dm_mod
[ 41.040590] CPU: 0 PID: 29 Comm: kworker/0:1 Not tainted
3.10.0-327.el7.x86_64 #1
[ 41.042180] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
[ 41.044635] Workqueue: events aer_isr
[ 41.045478] task: ffff880179435080 ti: ffff880179680000 task.ti:
ffff880179680000
[ 41.047097] RIP: 0010:[<ffffffffa02b438d>] [<ffffffffa02b438d>]
igb_configure_tx_ring+0x14d/0x280 [igb]
[ 41.049151] RSP: 0018:ffff880179683bf8 EFLAGS: 00010246
[ 41.050260] RAX: 0000000000003818 RBX: 0000000000000000 RCX: 0000000000003818
[ 41.051747] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 00000000002896b3
[ 41.053268] RBP: ffff880179683c20 R08: 0000000001010100 R09: 00000000ffffffe7
[ 41.054730] R10: ffffea0001eb6100 R11: ffffffffa02afa31 R12: 0000000000000000
[ 41.056201] R13: ffff880035dbc8c0 R14: ffff880175d03f80 R15: 000000017716e000
[ 41.057673] FS: 0000000000000000(0000) GS:ffff88017fc00000(0000)
knlGS:0000000000000000
[ 41.059337] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 41.060548] CR2: 0000000000003818 CR3: 0000000178331000 CR4: 00000000000006f0
[ 41.062028] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 41.063534] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 41.065025] Stack:
[ 41.065473] ffff880035dbc8c0 ffff880035dbce70 0000000000000001
ffff880035dbc8c8
[ 41.067119] ffff880035dbce70 ffff880179683c80 ffffffffa02b8a77
fefdf27269fb3cd8
[ 41.068781] 2009f9ee3386436f eb9e4e66756bbfdd 34002f8114a5d65f
9535990856231c4b
[ 41.094179] Call Trace:
[ 41.118688] [<ffffffffa02b8a77>] igb_configure+0x267/0x450 [igb]
[ 41.144286] [<ffffffffa02b94f1>] igb_up+0x21/0x1a0 [igb]
[ 41.170606] [<ffffffffa02b96a7>] igb_io_resume+0x37/0x70 [igb]
[ 41.195846] [<ffffffff813381e0>] ?
pci_cleanup_aer_uncorrect_error_status+0x90/0x90
[ 41.221767] [<ffffffff81338228>] report_resume+0x48/0x60
[ 41.246455] [<ffffffff8131e359>] pci_walk_bus+0x79/0xa0
[ 41.270722] [<ffffffff813381e0>] ?
pci_cleanup_aer_uncorrect_error_status+0x90/0x90
[ 41.296747] [<ffffffff813382f0>] broadcast_error_message+0xb0/0x100
[ 41.321552] [<ffffffff81338509>] do_recovery+0x1c9/0x280
[ 41.345507] [<ffffffff81338f58>] aer_isr+0x348/0x430
[ 41.368851] [<ffffffff8109d5fb>] process_one_work+0x17b/0x470
[ 41.392157] [<ffffffff8109e3cb>] worker_thread+0x11b/0x400
[ 41.416852] [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400
[ 41.441577] [<ffffffff810a5aef>] kthread+0xcf/0xe0
[ 41.465029] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[ 41.488341] [<ffffffff81645858>] ret_from_fork+0x58/0x90
[ 41.511247] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[ 41.535442] Code: c1 49 89 4e 30 49 8b 85 b8 05 00 00 48 85 c0 0f 84 39 01
00 00 81 c2 10 38 00 00 48 63 d2 48 01 d0 31 d2 89 10 49 8b 46 30 31 d2 <89> 10
41 8b 95 3c 06 00 00 b8 14 01 10 02 83 fa 05 74 0b 83 fa
[ 41.587718] RIP [<ffffffffa02b438d>] igb_configure_tx_ring+0x14d/0x280 [igb]
[ 41.610872] RSP <ffff880179683bf8>
[ 41.632301] CR2: 0000000000003818
And then it reboots. So what RAS improvement have we bought ourselves
here? What endpoints have you tested with this? Which ones recovered
reliably? Thanks,
Alex