qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] NVDIMM live migration broken?


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] NVDIMM live migration broken?
Date: Mon, 26 Jun 2017 13:56:51 +0100
User-agent: Mutt/1.8.0 (2017-02-23)

On Mon, Jun 26, 2017 at 10:05:01AM +0800, Haozhong Zhang wrote:
> On 06/23/17 10:55 +0100, Stefan Hajnoczi wrote:
> > On Fri, Jun 23, 2017 at 08:13:13AM +0800, address@hidden wrote:
> > > On 06/22/17 15:08 +0100, Stefan Hajnoczi wrote:
> > > > I tried live migrating a guest with NVDIMM on qemu.git/master 
> > > > (edf8bc984):
> > > > 
> > > >   $ qemu -M accel=kvm,nvdimm=on -m 1G,slots=4,maxmem=8G -cpu host \
> > > >          -object 
> > > > memory-backend-file,id=mem1,share=on,mem-path=nvdimm.dat,size=1G \
> > > >          -device nvdimm,id=nvdimm1,memdev=mem1 \
> > > >          -drive if=virtio,file=test.img,format=raw
> > > > 
> > > >   $ qemu -M accel=kvm,nvdimm=on -m 1G,slots=4,maxmem=8G -cpu host \
> > > >          -object 
> > > > memory-backend-file,id=mem1,share=on,mem-path=nvdimm.dat,size=1G \
> > > >          -device nvdimm,id=nvdimm1,memdev=mem1 \
> > > >          -drive if=virtio,file=test.img,format=raw \
> > > >          -incoming tcp::1234
> > > > 
> > > >   (qemu) migrate tcp:127.0.0.1:1234
> > > > 
> > > > The guest kernel panics or hangs every time on the destination.  It
> > > > happens as long as the nvdimm device is present - I didn't even mount it
> > > > inside the guest.
> > > > 
> > > > Is migration expected to work?
> > > 
> > > Yes, I tested on QEMU 2.8.0 several months ago and it worked. I'll
> > > have a look at this issue.
> > 
> > Great, thanks!
> > 
> > David Gilbert suggested the following on IRC, it sounds like a good
> > starting point for debugging:
> > 
> > Launch the destination QEMU with -S (vcpus will be paused) and after
> > migration has completed, compare the NVDIMM contents on source and
> > destination.
> > 
> 
> Which host and guest kernel are you testing? Is any workload running
> in guest when migration?
> 
> I just tested QEMU commit edf8bc984 with host/guest kernel 4.8.0, and
> could not reproduce the issue.

I can still reproduce the problem on qemu.git edf8bc984.

My guest kernel is fairly close to yours.  The host kernel is newer.

Host kernel: 4.11.6-201.fc25.x86_64
Guest kernel: 4.8.8-300.fc25.x86_64

Command-line:

  qemu-system-x86_64 \
      -enable-kvm \
      -cpu host \
      -machine pc,nvdimm \
      -m 1G,slots=4,maxmem=8G \
      -object memory-backend-file,id=mem1,share=on,mem-path=nvdimm.dat,size=1G \
      -device nvdimm,id=nvdimm1,memdev=mem1 \
      -drive if=virtio,file=test.img,format=raw \
      -display none \
      -serial stdio \
      -monitor unix:/tmp/monitor.sock,server,nowait

Start migration at the guest login prompt.  You don't need to log in or
do anything inside the guest.

There seems to be a guest RAM corruption because I get different
backtraces inside the guest every time.

The problem goes away if I remove -device nvdimm.

Here is an example backtrace:

[   28.577138] BUG: Bad rss-counter state mm:ffff9a21fd38aec0 idx:0 val:2605
[   28.577954] BUG: Bad rss-counter state mm:ffff9a21fd38aec0 idx:1 val:503
[   28.578646] BUG: non-zero nr_ptes on freeing mm: 73
[   28.579133] BUG: non-zero nr_pmds on freeing mm: 4
[   28.579932] BUG: unable to handle kernel paging request at ffff9a2100000000
[   28.581174] IP: [<ffffffffbe227723>] __kmalloc+0xc3/0x1f0
[   28.582015] PGD 3327c067 PUD 0 
[   28.582549] Oops: 0000 [#1] SMP
[   28.583032] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 
xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc 
ip6table_raw ip6table_mangle ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 
nf_nat_ipv6 ip6table_security iptable_raw iptable_mangle iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
iptable_security ebtable_filter ebtables ip6table_filter ip6_tables bochs_drm 
ttm drm_kms_helper snd_pcsp dax_pmem nd_pmem crct10dif_pclmul dax nd_btt 
crc32_pclmul ppdev snd_pcm ghash_clmulni_intel drm e1000 snd_timer snd 
soundcore acpi_cpufreq joydev i2c_piix4 tpm_tis parport_pc tpm_tis_core parport 
qemu_fw_cfg tpm nfit xfs libcrc32c virtio_blk crc32c_intel virtio_pci serio_raw 
virtio_ring virtio ata_generic pata_acpi
[   28.592394] CPU: 0 PID: 573 Comm: systemd-journal Not tainted 
4.8.8-300.fc25.x86_64 #1
[   28.593124] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
[   28.594208] task: ffff9a21f67e5b80 task.stack: ffff9a21fd0c0000
[   28.594752] RIP: 0010:[<ffffffffbe227723>]  [<ffffffffbe227723>] 
__kmalloc+0xc3/0x1f0
[   28.595485] RSP: 0018:ffff9a21fd0c3740  EFLAGS: 00010046
[   28.595976] RAX: ffff9a2100000000 RBX: 0000000002080020 RCX: 000000000000007f
[   28.596644] RDX: 0000000000010bf2 RSI: 0000000000000000 RDI: 000000000001c980
[   28.597311] RBP: ffff9a21fd0c3770 R08: ffff9a21ffc1c980 R09: 0000000002080020
[   28.597971] R10: ffff9a2100000000 R11: 0000000000000008 R12: 0000000002080020
[   28.598637] R13: 0000000000000030 R14: ffff9a21fe0018c0 R15: ffff9a21fe0018c0
[   28.599301] FS:  00007fd95ae4c700(0000) GS:ffff9a21ffc00000(0000) 
knlGS:0000000000000000
[   28.600050] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   28.600587] CR2: ffff9a2100000000 CR3: 000000003715f000 CR4: 00000000003406f0
[   28.601250] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   28.601908] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   28.602574] Stack:
[   28.602754]  ffffffffc03dde4d 0000000000000003 ffff9a21fd0c38e0 
000000000000001c
[   28.603493]  ffff9a21f6cfb000 ffff9a21fd0c38c8 ffff9a21fd0c3788 
ffffffffc03dde4d
[   28.604217]  0000000000000003 ffff9a21fd0c3800 ffffffffc03de043 
ffff9a21fd0c38c8
[   28.604942] Call Trace:
[   28.605185]  [<ffffffffc03dde4d>] ? alloc_indirect.isra.14+0x1d/0x50 
[virtio_ring]
[   28.605890]  [<ffffffffc03dde4d>] alloc_indirect.isra.14+0x1d/0x50 
[virtio_ring]
[   28.606561]  [<ffffffffc03de043>] virtqueue_add_sgs+0x1c3/0x4a0 [virtio_ring]
[   28.607086]  [<ffffffffc040165c>] __virtblk_add_req+0xbc/0x220 [virtio_blk]
[   28.607614]  [<ffffffffbe3fbb3d>] ? find_next_zero_bit+0x1d/0x20
[   28.608060]  [<ffffffffbe3c2e57>] ? __bt_get.isra.6+0xd7/0x1c0
[   28.608506]  [<ffffffffc040195d>] virtio_queue_rq+0x12d/0x290 [virtio_blk]
[   28.609013]  [<ffffffffbe3c06b3>] __blk_mq_run_hw_queue+0x233/0x380
[   28.609565]  [<ffffffffbe3b2101>] ? blk_run_queue+0x21/0x40
[   28.610087]  [<ffffffffbe3c045b>] blk_mq_run_hw_queue+0x8b/0xb0
[   28.610649]  [<ffffffffbe3c1926>] blk_sq_make_request+0x216/0x4d0
[   28.611225]  [<ffffffffbe3b5782>] generic_make_request+0xf2/0x1d0
[   28.611796]  [<ffffffffbe3b58dd>] submit_bio+0x7d/0x150
[   28.612297]  [<ffffffffbe1c6797>] ? __test_set_page_writeback+0x107/0x220
[   28.612952]  [<ffffffffc045b644>] xfs_submit_ioend.isra.14+0x84/0xd0 [xfs]
[   28.613617]  [<ffffffffc045bbfe>] xfs_do_writepage+0x26e/0x5f0 [xfs]
[   28.614219]  [<ffffffffbe1c8425>] write_cache_pages+0x205/0x530
[   28.614789]  [<ffffffffc045b990>] ? xfs_aops_discard_page+0x140/0x140 [xfs]
[   28.615460]  [<ffffffffc045b73b>] xfs_vm_writepages+0xab/0xd0 [xfs]
[   28.616052]  [<ffffffffbe1c940e>] do_writepages+0x1e/0x30
[   28.616569]  [<ffffffffbe1ba5c6>] __filemap_fdatawrite_range+0xc6/0x100
[   28.617192]  [<ffffffffbe1ba741>] filemap_write_and_wait_range+0x41/0x90
[   28.617832]  [<ffffffffc0465c23>] xfs_file_fsync+0x63/0x1d0 [xfs]
[   28.618415]  [<ffffffffbe285289>] vfs_fsync_range+0x49/0xa0
[   28.618940]  [<ffffffffbe28533d>] do_fsync+0x3d/0x70
[   28.619411]  [<ffffffffbe2855d0>] SyS_fsync+0x10/0x20
[   28.619887]  [<ffffffffbe003c57>] do_syscall_64+0x67/0x160
[   28.620410]  [<ffffffffbe802861>] entry_SYSCALL64_slow_path+0x25/0x25
[   28.621017] Code: 49 83 78 10 00 4d 8b 10 0f 84 ce 00 00 00 4d 85 d2 0f 84 
c5 00 00 00 49 63 47 20 49 8b 3f 4c 01 d0 40 f6 c7 0f 0f 85 1a 01 00 00 <48> 8b 
18 48 8d 4a 01 4c 89 d0 65 48 0f c7 0f 0f 94 c0 84 c0 74 
[   28.623292] RIP  [<ffffffffbe227723>] __kmalloc+0xc3/0x1f0
[   28.623712]  RSP <ffff9a21fd0c3740>
[   28.623975] CR2: ffff9a2100000000
[   28.624275] ---[ end trace 60d3c1e57c22eb41 ]---

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]