qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race


From: Peter Lieven
Subject: Re: [Qemu-devel] race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1
Date: Thu, 28 Jun 2012 12:13:20 +0200
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.23) Gecko/20110921 Thunderbird/3.1.15

On 28.06.2012 11:39, Jan Kiszka wrote:
On 2012-06-28 11:31, Peter Lieven wrote:
On 28.06.2012 11:21, Jan Kiszka wrote:
On 2012-06-28 11:11, Peter Lieven wrote:
On 27.06.2012 18:54, Jan Kiszka wrote:
On 2012-06-27 17:39, Peter Lieven wrote:
Hi all,

i debugged this further and found out that kvm-kmod-3.0 is working with
qemu-kvm-1.0.1 while kvm-kmod-3.3 and kvm-kmod-3.4 are not. What is
working as well is kvm-kmod-3.4 with an old userspace (qemu-kvm-0.13.0).
Has anyone a clue which new KVM feature could cause this if a vcpu is in
an infinite loop?
Before accusing kvm-kmod ;), can you check if the effect is visible with
an original Linux 3.3.x or 3.4.x kernel as well?
sorry, i should have been more specific. maybe I also misunderstood sth.
I was believing that kvm-kmod-3.0 is basically what is in vanialla kernel
3.0. If I use the ubuntu kernel from ubuntu oneiric (3.0.0) it works, if
I use
a self-compiled kvm-kmod-3.3/3.4 with that kernel it doesn't.
however, maybe we don't have to dig to deep - see below.
kvm-kmod wraps and patches things to make the kvm code from 3.3/3.4
working on an older kernel. This step may introduce bugs of its own.
Therefore my suggestion to use a "real" 3.x kernel to exclude that risk
first of all.

Then, bisection the change in qemu-kvm that apparently resolved the
issue would be interesting.

If we have to dig deeper, tracing [1] the lockup would likely be helpful
(all events of the qemu process, not just KVM related ones: trace-cmd
record -e all qemu-system-x86_64 ...).
that here is bascially whats going on:

    qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio:             mmio read
len 3 gpa 0xa0000 val 0x10ff
      qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:      gva
0xa0000 gpa 0xa0000 Read GPA
      qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio:             mmio
unsatisfied-read len 1 gpa 0xa0000 val 0x0
      qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
KVM_EXIT_MMIO (6)
      qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio:             mmio
read len 3 gpa 0xa0000 val 0x10ff
      qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:      gva
0xa0000 gpa 0xa0000 Read GPA
      qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio:             mmio
unsatisfied-read len 1 gpa 0xa0000 val 0x0
      qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
KVM_EXIT_MMIO (6)
      qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio:             mmio
read len 3 gpa 0xa0000 val 0x10ff
      qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:      gva
0xa0000 gpa 0xa0000 Read GPA
      qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio:             mmio
unsatisfied-read len 1 gpa 0xa0000 val 0x0
      qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
KVM_EXIT_MMIO (6)
      qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio:             mmio
read len 3 gpa 0xa0000 val 0x10ff
      qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:      gva
0xa0000 gpa 0xa0000 Read GPA
      qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio:             mmio
unsatisfied-read len 1 gpa 0xa0000 val 0x0
      qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
KVM_EXIT_MMIO (6)

its doing that forever. this is tracing the kvm module. doing the
qemu-system-x86_64 trace is a bit compilcated, but
maybe this is already sufficient. otherwise i will of course gather this
info as well.
That's only tracing KVM event, and it's tracing when things went wrong
already. We may need a full trace (-e all) specifically for the period
when this pattern above started.
i will do that. maybe i should explain that the vcpu is executing
garbage when this above starts. its basically booting from an empty
harddisk.

if i understand correctly qemu-kvm loops in kvm_cpu_exec(CPUState *env);

maybe the time to handle the monitor/qmp connection is just to short.
if i understand furhter correctly, it can only handle monitor connections
while qemu-kvm is executing kvm_vcpu_ioctl(env, KVM_RUN, 0); or am i
wrong here? the time spend in this state might be rather short.
Unless you played with priorities and affinities, the Linux scheduler
should provide the required time to the iothread.
I have a 1.1GB (85MB compressed) trace-file. If you have time to
look at it I could drop it somewhere.

We currently run all VMs with nice 1 because we observed that
this improves that controlability of the Node in case all VMs
have execessive CPU load. Running the VM unniced does
not change the behaviour unfortunately.

Peter
my concern is not that the machine hangs, just the the hypervisor is
unresponsive
and its impossible to reset or quit gracefully. the only way to get the
hypervisor
ended is via SIGKILL.
Right. Even if the guest runs wild, you must be able to control the vm
via the monitor etc. If not, that's a bug.

Jan





reply via email to

[Prev in Thread] Current Thread [Next in Thread]