[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-discuss] KVM guest gets aborted if blockcommit is called
From: |
Christian Rößner |
Subject: |
[Qemu-discuss] KVM guest gets aborted if blockcommit is called |
Date: |
Mon, 24 Aug 2015 22:45:16 +0200 |
Hello,
I spent now full five days to debug a major problem with backing up VMs. I run
a HP ProLiant Server SE316M1-R2 aka DL160G6) with two Xeon L5520 and 48GB RAM
tripple channel. On this server I do monitoring and Qemu/libvirt. I run 7
guests on this server, which runs with Gentoo Linux (hardened; Grsecurity
patched kernel, PaX, no RBAC).
All guests use raw images as disks (also tested QED and QCOW2). The systems are
all Gentoo and Ubuntu. All having qemu-guest-agent running.
app-emulation/libvirt-1.2.18-r1::gentoo was built with the following:
USE="caps fuse iscsi libvirtd lvm lxc macvtap nfs nls parted pcap qemu sasl
systemd udev vepa -apparmor -audit -avahi -firewalld -glusterfs -numa -openvz
-phyp -policykit -rbd (-selinux) -uml -virt-network -virtualbox
(-wireshark-plugins) -xen"
app-emulation/qemu-2.4.0::gentoo was built with the following:
USE="aio caps curl fdt filecaps jpeg ncurses nls pin-upstream-blobs png python
sasl seccomp spice ssh threads tls uuid vhost-net vnc xattr -accessibility
-alsa -bluetooth -debug -glusterfs -gtk -gtk2 -infiniband -iscsi -lzo -nfs
-numa -opengl -pulseaudio -rbd -sdl -sdl2 (-selinux) -smartcard -snappy -static
-static-softmmu -static-user -systemtap -tci -test -usb -usbredir -vde -virtfs
-vte -xen -xfs" PYTHON_TARGETS="python2_7" QEMU_SOFTMMU_TARGETS="i386 x86_64
-aarch64 (-alpha) (-arm) -cris -lm32 (-m68k) -microblaze -microblazeel (-mips)
-mips64 -mips64el -mipsel -moxie -or32 (-ppc) (-ppc64) -ppcemb -s390x -sh4
-sh4eb (-sparc) -sparc64 -unicore32 -xtensa -xtensaeb" QEMU_USER_TARGETS="i386
x86_64 -aarch64 (-alpha) (-arm) -armeb -cris (-m68k) -microblaze -microblazeel
(-mips) -mips64 -mips64el -mipsel -mipsn32 -mipsn32el -or32 (-ppc) (-ppc64)
-ppc64abi32 -s390x -sh4 -sh4eb (-sparc) -sparc32plus -sparc64 -unicore32"
I wrote a bash script hat shall backup all guests. It works like this:
1. Create external snapshot
2. Copy/rsync away the image
3. blockcommit snapshot
4. blockjob pivot
5. Copy/rsync away the XML description for the guest
6. Remove Snapshot file
I did some test running the script in a cron job. For this I found out that
copying the image file takes round about 15 minutes. So I did a 30 minute cycle
for the script.
4 or 5 cycles work perfectly. (1) and (2) are working and when it comes to
blockcommit, the guest may (random) be aborted and the command fails to
continue, because the guest is no longer running. Starting the guest again, I
found two situations:
1. I can directly call blockjob … —pivot, because the last blockcommit that
failed reached 100%, or
2. Run a blockjob abort action. Re-sync and pivot on command line and that
might work.
Anyways, blockcommit is not stable here. I tested this on qemu-2.3.0 and 2.4.0
In the logs I only get this:
…
2015-08-24 18:38:13.077+0000: starting up libvirt version: 1.2.18, qemu
version: 2.4.0
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
QEMU_AUDIO_DRV=none /usr/bin/qemu-system-x86_64 -name
mx.roessner-net.de-TESTING -S -machine pc-i440fx-2.1,accel=kvm,usb=off -cpu
qemu64,+kvm_pv_eoi -m 4096 -realtime mlock=off -smp
4,sockets=4,cores=1,threads=1 -uuid d86b82d5-153f-4dd9-aa66-d98c2e65db8c
-no-user-config -nodefaults -device sga -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/mx.roessner-net.de-TESTING.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew
-global kvm-pit.lost_tick_policy=discard -no-shutdown -boot
order=cd,menu=on,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x8 -drive
file=/var/lib/libvirt/images/mx.roessner-net.de-TESTING.img,if=none,id=drive-virtio-disk0,format=raw,cache=writeback
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
-drive if=none,id=drive-ide0-1-0,readonly=on,format=raw -device
ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev
tap,fd=34,id=hostnet0,vhost=on,vhostfd=35 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=54:52:00:27:ac:8d,bus=pci.0,addr=0x3
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0
-chardev
socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/mx.roessner-net.de-TESTING.org.qemu.guest_agent.0,server,nowait
-device
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
-vnc 127.0.0.1:7 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device
i6300esb,id=watchdog0,bus=pci.0,addr=0x7 -watchdog-action reset -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -object
rng-random,id=objrng0,filename=/dev/random -device
virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x6 -msg timestamp=on
char device redirected to /dev/pts/8 (label charserial0)
Formatting
'/var/backups/snapshots/backup-snapshot-mx.roessner-net.de-TESTING.qcow2',
fmt=qcow2 size=107374182400
backing_file='/var/lib/libvirt/images/mx.roessner-net.de-TESTING.img'
backing_fmt='raw' encryption=off cluster_size=65536 lazy_refcounts=off
refcount_bits=16
Formatting
'/var/backups/snapshots/backup-snapshot-mx.roessner-net.de-TESTING.qcow2',
fmt=qcow2 size=107374182400
backing_file='/var/lib/libvirt/images/mx.roessner-net.de-TESTING.img'
backing_fmt='raw' encryption=off cluster_size=65536 lazy_refcounts=off
refcount_bits=16
Co-routine re-entered recursively
2015-08-24 19:43:17.700+0000: shutting down
I tried to find out what this error: "Co-routine re-entered recursively" means?
I have no idea. I only know that is is in qemu-coroutine.c line 111. But what
causes this error? What am I missing?
I checked a different linux kernel. Pur vanilla sources with NUMA-balancing on
and off. Several Grsecurity-Kernels. Kernel makes no difference. Qemu version
makes no difference. If I clean memory, I have round about 36GB of free memory.
Storage is also ok, because it is a BBU driven P410i RAID-controller with
RAID1+0 15k SAS disks. Even this server is 6 years old, it has enough power. So
I don't think it is a resource or hardware problem. Anything else on the server
runs perfectly without any issues.
So if you have any idea, what could cause these aborts, please let me know :-)
Only stuff I found on the web is that someone said that this co-routine code
would be ugly and probably not thread save. No idea where I found this message.
But could this be a threading problem?
Many, many thanks in advance
Christian
smime.p7s
Description: S/MIME cryptographic signature
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Qemu-discuss] KVM guest gets aborted if blockcommit is called,
Christian Rößner <=