[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] vfio-pci freezes host
From: |
Harald Braumann |
Subject: |
[Qemu-devel] vfio-pci freezes host |
Date: |
Sat, 9 Nov 2013 02:33:59 +0100 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
(please CC as I'm not subscribed)
Hi,
I'm passing through a GPU using vfio-pci. This regularly completely
freezes the host. I'm hoping the attached files give some clue as to
what the problem might be.
Specs:
Chipset: AMD 990FX
Kernel: 3.12.0
QEMU:
latest as of today (commit 964668b03d26f0b5baa5e5aff0c966f4fcb76e9e)
GPU:
06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
Juniper XT [Radeon HD 5770]
06:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Juniper HDMI Audio
[Radeon HD 5700 Series]
QEMU command line:
/home/harry/dev/kvm-gpu-passthrough/qemu/x86_64-softmmu/qemu-system-x86_64 \
-runas spielzeug \
-monitor unix:monitor,server,nowait \
-L /home/harry/dev/kvm-gpu-passthrough/qemu/pc-bios \
-drive file=spielzeug_tmp.qcow2,if=virtio,cache=none,media=disk \
-boot order=c \
-smp 4 \
-cpu host \
-m 4096M \
-net nic,model=virtio,macaddr=52:54:00:12:34:57 \
-net tap,ifname=tap0,script=no,downscript=no \
-localtime \
-enable-kvm \
-M q35 \
-vga none \
-nographic \
-device
ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1,romfile=radeon-hd-5770.rom
\
-device
vfio-pci,host=0000:06:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on \
-device vfio-pci,host=0000:06:00.1,bus=root.1,addr=00.1 \
-usbdevice tablet
QEMU starts up and after a view seconds the host completely
freezes. Sometimes I'm able to still get some dmesg output or a kernel
panic. In these cases it can be seen, that always some other PCI
device produces some error.
Example:
[ 179.998189] ------------[ cut here ]------------
[ 179.998211] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:264
dev_watchdog+0xd9/0x13f()
[ 179.998228] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
[ 179.998229] Modules linked in: tun vfio_pci vfio_iommu_type1 vfio vboxpci(O)
vboxnetadp(O) binfmt_misc vboxnetflt(O) vboxdrv(O) deflate ctr twofish_generic
twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common
camellia_generic camellia_aesni_avx_x86_64 camellia_x86_64 serpent_avx_x86_64
serpent_sse2_x86_64 xts serpent_generic blowfish_generic blowfish_x86_64
blowfish_common cast5_avx_x86_64 cast5_generic cast_common des_generic cbc cmac
xcbc rmd160 sha512_ssse3 sha512_generic sha256_ssse3 sha256_generic crypto_null
af_key xfrm_algo bridge stp llc iptable_mangle iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables
x_tables ext2 it87 hwmon_vid fuse joydev hid_generic radeon snd_hda_codec_hdmi
usbhid snd_hda_codec_realtek hid snd_hda_intel snd_hda_codec snd_hwdep
snd_pcm_oss ttm snd_mixer_oss drm_kms_helper kvm_amd kvm snd_pcm drm
snd_page_alloc snd_seq_dummy snd_seq_midi snd_seq_oss snd_seq_midi_event
snd_rawmidi sp5100_tco mxm_wmi agpgart snd_seq i2c_piix4 i2c_algo_bit i2c_core
fam15h_power microcode pcspkr evdev snd_seq_device wmi k10temp snd_timer button
snd processor soundcore edac_core ohci_pci thermal_sys ohci_hcd ext4 crc16 jbd2
mbcache dm_crypt dm_mod md_mod pci_stub sg sr_mod cdrom sd_mod crc_t10dif
crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel
aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd
firewire_ohci firewire_core crc_itu_t r8169 mii ehci_pci ehci_hcd xhci_hcd
usbcore usb_common ahci libahci libata scsi_mod
[ 179.998304] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 3.12.0-hb #1
[ 179.998306] Hardware name: To be filled by O.E.M. To be filled by
O.E.M./SABERTOOTH 990FX, BIOS 1208 04/18/2012
[ 179.998313] 0000000000000000 ffffffff81390b45 ffff88024ecc3e30
ffffffff81036e55
[ 179.998316] ffffffff812efdbe ffff880241240000 ffff88024ecc3e80
ffffffff812efce5
[ 179.998318] ffff880241240348 ffffffff81036eb1 ffffffff81526cee
0000000000000030
[ 179.998324] Call Trace:
[ 179.998326] <IRQ> [<ffffffff81390b45>] ? dump_stack+0x41/0x51
[ 179.998333] [<ffffffff81036e55>] ? warn_slowpath_common+0x74/0x89
[ 179.998336] [<ffffffff812efdbe>] ? dev_watchdog+0xd9/0x13f
[ 179.998338] [<ffffffff812efce5>] ? dev_deactivate_queue+0x54/0x54
[ 179.998340] [<ffffffff81036eb1>] ? warn_slowpath_fmt+0x47/0x49
[ 179.998341] [<ffffffff812ef9e8>] ? netif_tx_lock+0x47/0x72
[ 179.998345] [<ffffffff812efdbe>] ? dev_watchdog+0xd9/0x13f
[ 179.998347] [<ffffffff8103fd35>] ? call_timer_fn+0x2d/0xdc
[ 179.998350] [<ffffffff81040677>] ? run_timer_softirq+0x18c/0x1b0
[ 179.998351] [<ffffffff812efce5>] ? dev_deactivate_queue+0x54/0x54
[ 179.998353] [<ffffffff8103a68a>] ? __do_softirq+0xc3/0x1df
[ 179.998355] [<ffffffff81396cdc>] ? call_softirq+0x1c/0x30
[ 179.998357] [<ffffffff8100422a>] ? do_softirq+0x2a/0x64
[ 179.998359] [<ffffffff8103a866>] ? irq_exit+0x3a/0x7a
[ 179.998361] [<ffffffff81024111>] ? smp_apic_timer_interrupt+0x2c/0x37
[ 179.998363] [<ffffffff8139620a>] ? apic_timer_interrupt+0x6a/0x70
[ 179.998365] <EOI> [<ffffffff81077257>] ?
clockevents_program_event+0x98/0xb4
[ 179.998368] [<ffffffff812af2f7>] ? cpuidle_enter_state+0x4d/0x9e
[ 179.998376] [<ffffffff812af421>] ? cpuidle_idle_call+0xd9/0x12e
[ 179.998379] [<ffffffff81009e72>] ? arch_cpu_idle+0x5/0x14
[ 179.998382] [<ffffffff8106c212>] ? cpu_startup_entry+0x102/0x152
[ 179.998385] [<ffffffff81022d2d>] ? start_secondary+0x1d9/0x1dd
[ 179.998387] ---[ end trace 206ceb71b6aa3a0a ]---
[ 180.023699] r8169 0000:09:00.0 eth0: link up
[ 230.425196] kvm: zapping shadow pages for mmio generation wraparound
[ 240.028560] SysRq : Emergency Sync
[ 240.632371] Emergency Sync complete
[ 244.008964] br0: port 2(tap0) entered disabled state
Other example:
[ 165.586276] usb 9-3: USB disconnect, device number 2
[ 165.596353] r8169 0000:09:00.0 eth0: rtl_chipcmd_cond == 1 (loop: 100,
delay: 100).
[ 165.597471] r8169 0000:09:00.0 eth0: link up
[ 165.622236] r8169 0000:09:00.0 eth0: rtl_chipcmd_cond == 1 (loop: 100,
delay: 100).
[ 165.627619] r8169 0000:09:00.0 eth0: link down
[ 165.627765] br0: port 1(eth0) entered disabled state
[ 165.712495] ohci-pci 0000:00:13.0: leak ed ffff880243a010a0 (#81) state 0
(has tds)
[ 165.712498] ohci-pci 0000:00:13.0: leak ed ffff880243a01050 (#82) state 0
(has tds)
[ 166.205984] irq 20: nobody cared (try booting with the "irqpoll" option)
[ 166.205988] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G O 3.12.0-hb #1
[ 166.205989] Hardware name: To be filled by O.E.M. To be filled by
O.E.M./SABERTOOTH 990FX, BIOS 1208 04/18/2012
[ 166.205991] 0000000000000000 ffffffff81390b45 ffff880244c90d00
ffffffff8106e18c
[ 166.205993] ffff880244c90d00 0000000000000000 ffff880244c90d00
ffffffff8106e4ed
[ 166.205995] 0000000000000000 0000000000000014 ffff880244c90d00
0000000000000000
[ 166.205997] Call Trace:
[ 166.205998] <IRQ> [<ffffffff81390b45>] ? dump_stack+0x41/0x51
[ 166.206005] [<ffffffff8106e18c>] ? __report_bad_irq+0x2c/0xb4
[ 166.206008] [<ffffffff8106e4ed>] ? note_interrupt+0x136/0x1b3
[ 166.206010] [<ffffffff8106c9af>] ? handle_irq_event_percpu+0x105/0x16c
[ 166.206012] [<ffffffff8106ca41>] ? handle_irq_event+0x2b/0x46
[ 166.206014] [<ffffffff8106ece9>] ? handle_fasteoi_irq+0x71/0xa1
[ 166.206016] [<ffffffff810041f8>] ? handle_irq+0x15/0x1d
[ 166.206018] [<ffffffff81003e8e>] ? do_IRQ+0x40/0x95
[ 166.206020] [<ffffffff81394e2a>] ? common_interrupt+0x6a/0x6a
[ 166.206021] <EOI> [<ffffffff812af2f7>] ? cpuidle_enter_state+0x4d/0x9e
[ 166.206040] [<ffffffff812af421>] ? cpuidle_idle_call+0xd9/0x12e
[ 166.206042] [<ffffffff81009e72>] ? arch_cpu_idle+0x5/0x14
[ 166.206044] [<ffffffff8106c212>] ? cpu_startup_entry+0x102/0x152
[ 166.206047] [<ffffffff81022d2d>] ? start_secondary+0x1d9/0x1dd
[ 166.206048] handlers:
[ 166.206059] [<ffffffffa0093fa6>] usb_hcd_irq [usbcore]
[ 166.206060] Disabling IRQ #20
[ 304.730601] br0: port 2(tap0) entered disabled state
Quite often the SATA controller has some error (see ahci-error.jpg)
Another symptom was spam of "[R600] flush TLB failed" in dmesg for
some time, then the host freezes.
Attached is a tgz with the following files:
- ahci-error.jpg
- dmesg
- interrupts: copy of /proc/interrupts
- pci-dump: produced with lspci -vvvxxx
- qemu-config.log: config.log from QEMU source
- vfio.log: output of QEMU with vfio debugging enabled
Cheers,
harry
vfio-freeze-dumps.tgz
Description: application/gtar-compressed
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Qemu-devel] vfio-pci freezes host,
Harald Braumann <=