qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] stop cpus before forking.


From: Anthony Liguori
Subject: Re: [Qemu-devel] [PATCH] stop cpus before forking.
Date: Mon, 14 Jun 2010 14:58:47 -0500
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.9) Gecko/20100423 Lightning/1.0b1 Thunderbird/3.0.4

On 06/14/2010 02:42 PM, Glauber Costa wrote:
On Mon, Jun 14, 2010 at 02:33:00PM -0500, Anthony Liguori wrote:
On 06/14/2010 02:27 PM, Glauber Costa wrote:
This patch fixes a bug that happens with kvm, irqchip-in-kernel,
while adding a netdev. Despite the situations of reproduction being
specific to kvm, I believe this fix is pretty generic, and fits here.
Specially if we ever want to have our own irqchip in kernel too.

The problem happens after the fork system call, and although it is not
100 % reproduceable, happens pretty often. After fork, the memory where
the apic is mapped is present in both processes. It ends up confusing
the vcpus somewhere in the irq<->   ack path, and qemu hangs, with no
irqs being delivered at all from that point on.

Making sure the vcpus are stopped before forking makes the problem go
away. Besides, this is a pretty unfrequent operation, which already hangs
the io-thread for a while. So it should not hurt performance.

Signed-off-by: Glauber Costa<address@hidden>
This doesn't make very much sense to me but smells like a kernel bug to me.
My interpretation is that by doing that, we make sure no in-flight
requests are happening. Actually, a sleep(x), with x sufficiently big
is enough to make this problem go away, but that is too hacky.

vm_stop() is probably just acting a glorified sleep() since it has to wait for each thread to stop.

I do agree that this is most likely a kernel bug. But as with any other
kernel bugs, I believe this is a easy workaround to have things working
even in older kernels until we fix it.

If we don't know what the bug is, then we do not know whether this is a work around. Rather, this change happens to make the bug more difficult to reproduce with your test case.

Even if it isn't, I can't rationalize why stopping the vm like this
is enough to fix such a problem.  Is the problem that the KVM VCPU
threads get duplicated while potentially running or something like
that?
I doubt fork is duplicating the vcpu threads. More than that, this
bug does not happen with userspace irqchip.
So I believe that either irq request or the ack itself is reaching the
wrong process, forever stalling the apic.

That sounds more like a signal delivery issue. It's not obvious to me that we're doing the wrong thing with signal mask though.

If it's a signal mask related issue, then vm_stop isn't a proper fix as there would be still be a race.

Regards,

Anthony Liguori







reply via email to

[Prev in Thread] Current Thread [Next in Thread]