qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Problems with qemu "modern" virtio on sparc64


From: Mark Cave-Ayland
Subject: Re: [Qemu-devel] Problems with qemu "modern" virtio on sparc64
Date: Fri, 6 Jan 2017 17:04:58 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.5.1

On 30/12/16 19:57, Guenter Roeck wrote:

> On 12/30/2016 10:18 AM, Mark Cave-Ayland wrote:
>> On 25/11/16 18:11, Guenter Roeck wrote:
>>
>>> Hi,
>>>
>>> I am using virtio on sparc64 for my Linux kernel runtime tests.
>>>
>>> Starting with qemu v2.7, I noticed that the kernel either gets stuck or
>>> crashes.
>>> After adding some debug information to the kernel, I found that the
>>> problem happens
>>> in vp_reset().
>>>
>>> Interestingly, when running v4.9-rc6 without modification, the kernel
>>> crashes on me.
>>> If I add pr_info just before and after the vp_iowrite8() in
>>> virtio_pci_modern.c:vp_reset(),
>>> the kernel gets stuck in the vp_iowrite8().
>>>
>>> Here is the relevant part of the crash:
>>>
>>> [    3.151167] Unable to handle kernel NULL pointer dereference
>>> [    3.151809] tsk->{mm,active_mm}->context = 0000000000000000
>>> [    3.152430] tsk->{mm,active_mm}->pgd = fffff80000402000
>>> [    3.153032]               \|/ ____ \|/
>>> [    3.153032]               "@'/ .. \`@"
>>> [    3.153032]               /_| \__/ |_\
>>> [    3.153032]                  \__U_/
>>> [    3.154042] swapper(1): Oops [#1]
>>> [    3.154773] CPU: 0 PID: 1 Comm: swapper Not tainted 4.9.0-rc5+ #4
>>> [    3.155375] task: fffff8001f0af620 task.stack: fffff8001f0b0000
>>> [    3.155958] TSTATE: 0000009980001606 TPC: 00000000006edf44 TNPC:
>>> 00000000006edf48 Y: 00000000    Not tainted
>>> [    3.156901] TPC: <vp_reset+0x4/0x40>
>>>
>>> None of the pointers used in vp_reset() is NULL. As mentioned above,
>>> adding a pr_info
>>> just before vp_iowrite8() makes the crash disappear and the kernel is
>>> stuck instead.
>>> Here is how it looks like:
>>>
>>> [    3.104243] Hi there
>>> [   26.912509] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s!
>>> [swapper:1]
>>> [   26.913102] Modules linked in:
>>> [   26.914061] CPU: 0 PID: 1 Comm: swapper Not tainted 4.9.0-rc5+ #5
>>> [   26.914633] task: fffff8001f0af620 task.stack: fffff8001f0b0000
>>> [   26.915156] TSTATE: 0000004480001605 TPC: 00000000006edf50 TNPC:
>>> 00000000006edf54 Y: 00000412    Not tainted
>>> [   26.915954] TPC: <vp_reset+0x10/0x60>
>>>
>>> Another pr_info() after vp_iowrite8() is never printed, suggesting that
>>> the code never
>>> gets to that point.
>>>
>>> The kernel configuration is sparc64_defconfig with the following
>>> configuration
>>> options enabled.
>>>
>>> CONFIG_DEVTMPFS=y
>>> CONFIG_VIRTIO=y
>>> CONFIG_VIRTIO_PCI=y
>>> CONFIG_VIRTIO_BLK=y
>>> CONFIG_VIRTIO_NET=y
>>> CONFIG_VIRTIO_BALLOON=y
>>> CONFIG_VIRTIO_CONSOLE=y
>>> CONFIG_SCSI_VIRTIO=y
>>>
>>> Command line is
>>>
>>> qemu-system-sparc64 -M sun4u -cpu "TI UltraSparc IIi" -m 512 \
>>>     -drive file=simple-root-filesystem-sparc.ext3,if=virtio,format=raw \
>>>     -kernel arch/sparc/boot/image -no-reboot \
>>>     -append "root=/dev/vda init=/sbin/init.sh console=ttyS0" \
>>>     -nographic -monitor none
>>>
>>> Does anyone have an idea what might be wrong ?
>>>
>>> Thanks,
>>> Guenter
>>
>> Hi Guenter,
>>
>> Have you been able to investigate this issue any further? Does the 2.8
>> release solve the issue for you?
>>
> 
> I did not make any progress, and reverted to qemu v2.6.
> 
> Problem is still seen with v2.8 (release); it crashes. The recent virtio
> related patch does not make a difference. v2.7.1 also still crashes.
> Only difference with both versions is the crash traceback.

I can recreate a similar crash here, and it seems to be caused by using
virtio with legacy mode disabled (note that I'm testing a virtio patch
for OpenBIOS which explains the different command line):

$ ./qemu-system-sparc64 \
-drive
file=debian-9.0-sparc64-NETINST1.iso,if=none,index=0,id=cd,media=cdrom \
-device virtio-blk-pci,drive=cd \
-nographic \
-bios openbios-builtin.elf.nostrip \
-m 256

[    0.000000] PROMLIB: Sun IEEE Boot Prom 'OBP 3.10.24 1999/01/01 01:01'
[    0.000000] PROMLIB: Root node compatible: sun4u
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 4.3.0-1-sparc64
(address@hidden) (gcc version 4.9.3 (Debian 4.9.3-10) )
#1 Debian 4.3.3-2 (2015-12-17)
[    0.000000] bootconsole [earlyprom0] enabled
[    0.000000] ARCH: SUN4U

(lots cut)

[   14.971631] Unpacking initramfs...
[   15.662728] Freeing initrd memory: 8296K (fffff80004400000 -
fffff80004c1a000)
[   15.667565] futex hash table entries: 256 (order: -1, 6144 bytes)
[   15.668290] audit: initializing netlink subsys (disabled)
[   15.669175] audit: type=2000 audit(1.116:1): initialized
[   15.673590] HugeTLB registered 8 MB page size, pre-allocated 0 pages
[   15.674040] zbud: loaded
[   15.676552] VFS: Disk quotas dquot_6.6.0
[   15.676772] VFS: Dquot-cache hash table entries: 1024 (order 0, 8192
bytes)
[   15.705203] Block layer SCSI generic (bsg) driver version 0.4 loaded
(major 252)
[   15.705981] io scheduler noop registered
[   15.706114] io scheduler deadline registered
[   15.706854] io scheduler cfq registered (default)
[   15.713107] ffe30ea0: ttyS0 at MMIO 0x1fe020043f8 (irq = 5, base_baud
= 115387) is a 16550A
[   15.713393] Console: ttyS0 (SU)
[   15.747368] console [ttyS0] enabled
[   15.752638] mousedev: PS/2 mouse device common for all mice
[   15.762844] rtc-m48t59 rtc-m48t59.0: rtc core: registered m48t59 as rtc0
[   15.764132] ledtrig-cpu: registered to indicate activity on CPUs
[   15.766301] NET: Registered protocol family 10
[   15.783012] mip6: Mobile IPv6
[   15.783469] NET: Registered protocol family 17
[   15.783979] mpls_gso: MPLS GSO support
[   15.787055] registered taskstats version 1
[   15.788251] zswap: loaded using pool lzo/zbud
[   15.793110] rtc-m48t59 rtc-m48t59.0: setting system clock to
2017-01-06 16:45:13 UTC (1483721113)
[   16.232741] random: systemd-udevd urandom read with 0 bits of entropy
available
[   17.364137] ne2k-pci.c:v1.03 9/22/2003 D. Becker/P. Gortmaker
[   17.419850] ne2k-pci 0000:00:04.0 eth0: RealTek RTL-8029 found at
0x1fe02008000, IRQ 6, 52:54:00:12:34:56.
[   17.437419] Unable to handle kernel NULL pointer dereference
[   17.437894] tsk->{mm,active_mm}->context = 0000000000000020
[   17.438299] tsk->{mm,active_mm}->pgd = fffff8000f53a000
[   17.438823]               \|/ ____ \|/
[   17.438823]               "@'/ .. \`@"
[   17.438823]               /_| \__/ |_\
[   17.438823]                  \__U_/
[   17.439865] systemd-udevd(63): Oops [#1]
[   17.440435] CPU: 0 PID: 63 Comm: systemd-udevd Not tainted
4.3.0-1-sparc64 #1 Debian 4.3.3-2
[   17.441063] task: fffff8000f4280e0 ti: fffff8000f590000 task.ti:
fffff8000f590000
[   17.441601] TSTATE: 0000009911001601 TPC: 000000001002426c TNPC:
0000000010024270 Y: 00000000    Not tainted
[   17.442760] TPC: <vp_reset+0xc/0x40 [virtio_pci]>
[   17.443210] g0: 00000000006b03c0 g1: 000001ff04040014 g2:
fffffffffffb0000 g3: 0000000000000000
[   17.443804] g4: fffff8000f4280e0 g5: 00000000002ca680 g6:
fffff8000f590000 g7: 0000000000000001
[   17.444400] o0: 0000000000000001 o1: fffff8000f5903f8 o2:
0000000010024264 o3: 0000000000000000
[   17.444993] o4: 0000000000000032 o5: 0000000000000000 sp:
fffff8000f592991 ret_pc: 000000000040564c
[   17.445613] RPC: <__spitfire_cee_trap_continue+0xb8/0xc8>
[   17.446027] l0: 0000000000001000 l1: 0000009911001600 l2:
0000000010024260 l3: 0000000000000400
[   17.446691] l4: 0000000000000000 l5: 0000000000000001 l6:
0000000000000000 l7: 0000000000000008
[   17.447269] i0: 0000000000000000 i1: 000000001000ad08 i2:
fffff8000f593380 i3: fffff8000f41a880
[   17.447844] i4: fffff8000f41a888 i5: fffff8000f5b92a0 i6:
fffff8000f592a41 i7: 0000000000764384
[   17.448447] I7: <dev_set_name+0x24/0x40>
[   17.448768] Call Trace:
[   17.449032]  [0000000000764384] dev_set_name+0x24/0x40
[   17.449466]  [000000001000a4e4] register_virtio_device+0x64/0x100
[virtio]
[   17.449919]  [0000000010025820] virtio_pci_probe+0xa0/0x160 [virtio_pci]
[   17.450426]  [00000000006eaf40] pci_device_probe+0x80/0x100
[   17.450799]  [000000000076928c] driver_probe_device+0x16c/0x480
[   17.451194]  [0000000000769628] __driver_attach+0x88/0xa0
[   17.451565]  [0000000000766fdc] bus_for_each_dev+0x5c/0xa0
[   17.451934]  [0000000000768c1c] driver_attach+0x1c/0x40
[   17.452300]  [0000000000768710] bus_add_driver+0x1f0/0x2a0
[   17.452668]  [000000000076a114] driver_register+0x74/0x120
[   17.453030]  [00000000006e9894] __pci_register_driver+0x34/0x60
[   17.453467]  [000000001002a018] virtio_pci_driver_init+0x18/0x28
[virtio_pci]
[   17.453928]  [0000000000426c58] do_one_initcall+0xb8/0x200
[   17.454294]  [0000000000525200] do_init_module+0x50/0x1f0
[   17.454728]  [00000000004c2514] load_module+0x1c54/0x23c0
[   17.455092]  [00000000004c2e74] SyS_finit_module+0x94/0xe0
[   17.455507] Disabling lock debugging due to kernel taint
[   17.455957] Caller[0000000000764384]: dev_set_name+0x24/0x40
[   17.456380] Caller[000000001000a4e4]:
register_virtio_device+0x64/0x100 [virtio]
[   17.456875] Caller[0000000010025820]: virtio_pci_probe+0xa0/0x160
[virtio_pci]
[   17.457355] Caller[00000000006eaf40]: pci_device_probe+0x80/0x100
[   17.457753] Caller[000000000076928c]: driver_probe_device+0x16c/0x480
[   17.458167] Caller[0000000000769628]: __driver_attach+0x88/0xa0
[   17.458642] Caller[0000000000766fdc]: bus_for_each_dev+0x5c/0xa0
[   17.459038] Caller[0000000000768c1c]: driver_attach+0x1c/0x40
[   17.459416] Caller[0000000000768710]: bus_add_driver+0x1f0/0x2a0
[   17.459808] Caller[000000000076a114]: driver_register+0x74/0x120
[   17.460208] Caller[00000000006e9894]: __pci_register_driver+0x34/0x60
[   17.460631] Caller[000000001002a018]:
virtio_pci_driver_init+0x18/0x28 [virtio_pci]
[   17.461134] Caller[0000000000426c58]: do_one_initcall+0xb8/0x200
[   17.461526] Caller[0000000000525200]: do_init_module+0x50/0x1f0
[   17.461912] Caller[00000000004c2514]: load_module+0x1c54/0x23c0
[   17.462295] Caller[00000000004c2e74]: SyS_finit_module+0x94/0xe0
[   17.462758] Caller[00000000004060f4]: linux_sparc_syscall+0x34/0x44
[   17.463195] Caller[fffff801005fe708]: 0xfffff801005fe708
[   17.463597] Instruction DUMP: 9de3bf50  01000000  01000000 <c25e22f8>
82006014  c0a843a0  c25e22f8  82006014  c28843a0

Disabling "modern" mode enables boot to proceed as normal:

$ ./qemu-system-sparc64 \
-drive
file=debian-9.0-sparc64-NETINST1.iso,if=none,index=0,id=cd,media=cdrom \
-device virtio-blk-pci,disable-modern=on,drive=cd \
-nographic \
-bios openbios-builtin.elf.nostrip \
-m 256

[    0.000000] PROMLIB: Sun IEEE Boot Prom 'OBP 3.10.24 1999/01/01 01:01'
[    0.000000] PROMLIB: Root node compatible: sun4u
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 4.3.0-1-sparc64
(address@hidden) (gcc version 4.9.3 (Debian 4.9.3-10) )
#1 Debian 4.3.3-2 (2015-12-17)
[    0.000000] bootconsole [earlyprom0] enabled
[    0.000000] ARCH: SUN4U

(lots cut)

[   11.390769] Unpacking initramfs...
[   12.082838] Freeing initrd memory: 8296K (fffff80004400000 -
fffff80004c1a000)
[   12.087735] futex hash table entries: 256 (order: -1, 6144 bytes)
[   12.088461] audit: initializing netlink subsys (disabled)
[   12.089350] audit: type=2000 audit(1.116:1): initialized
[   12.094237] HugeTLB registered 8 MB page size, pre-allocated 0 pages
[   12.094830] zbud: loaded
[   12.097379] VFS: Disk quotas dquot_6.6.0
[   12.097616] VFS: Dquot-cache hash table entries: 1024 (order 0, 8192
bytes)
[   12.126631] Block layer SCSI generic (bsg) driver version 0.4 loaded
(major 252)
[   12.127420] io scheduler noop registered
[   12.127555] io scheduler deadline registered
[   12.127905] io scheduler cfq registered (default)
[   12.133862] ffe30ea0: ttyS0 at MMIO 0x1fe020043f8 (irq = 5, base_baud
= 115387) is a 16550A
[   12.134133] Console: ttyS0 (SU)
[   12.188942] console [ttyS0] enabled
[   12.195464] mousedev: PS/2 mouse device common for all mice
[   12.204837] rtc-m48t59 rtc-m48t59.0: rtc core: registered m48t59 as rtc0
[   12.206309] ledtrig-cpu: registered to indicate activity on CPUs
[   12.208736] NET: Registered protocol family 10
[   12.227061] mip6: Mobile IPv6
[   12.227679] NET: Registered protocol family 17
[   12.228391] mpls_gso: MPLS GSO support
[   12.232356] registered taskstats version 1
[   12.233854] zswap: loaded using pool lzo/zbud
[   12.239243] rtc-m48t59 rtc-m48t59.0: setting system clock to
2017-01-06 16:52:10 UTC (1483721530)
[   12.697036] random: systemd-udevd urandom read with 0 bits of entropy
available
[   13.796267] ne2k-pci.c:v1.03 9/22/2003 D. Becker/P. Gortmaker
[   13.840168] ne2k-pci 0000:00:04.0 eth0: RealTek RTL-8029 found at
0x1fe02008000, IRQ 6, 52:54:00:12:34:56.
[   13.860059] virtio-pci 0000:00:06.0: virtio_pci: leaving for legacy
driver
[   14.710597] ne2k-pci 0000:00:04.0 enp0s4: renamed from eth0
[   14.714208]  vda: vda1 vda2 vda3 vda4 vda5 vda6 vda7 vda8


Guenter, can you try a similar command line and confirm whether it fixes
the issue for you under QEMU 2.7 and 2.8? I have no idea as to why the
difference in legacy/non-legacy codepaths should crash the kernel though.


ATB,

Mark.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]