qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] coroutine-ucontext broken for x86-32


From: Jan Kiszka
Subject: Re: [Qemu-devel] coroutine-ucontext broken for x86-32
Date: Wed, 09 May 2012 08:12:25 -0300
User-agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); de; rv:1.8.1.12) Gecko/20080226 SUSE/2.0.0.12-1.1 Thunderbird/2.0.0.12 Mnenhy/0.7.5.666

On 2012-05-09 04:32, Michael Tokarev wrote:
> On 08.05.2012 23:35, Jan Kiszka wrote:
>> Hi,
>>
>> I hunted down a fairly subtle corruption of the VCPU thread signal mask
>> in KVM mode when using the ucontext version of coroutines:
>>
>> coroutine_new calls getcontext, makecontext, swapcontext. Those
>> functions get/set also the signal mask of the caller. Unfortunately,
>> they only use the sigprocmask syscall on i386, not the rt_sigprocmask
>> version. So they do not properly save/restore the blocked RT signals,
>> namely our SIG_IPI - it becomes unblocke this way. And this will sooner
>> or later make the kernel actually deliver a SIG_IPI to our
>> dummy_handler, and we miss a wakeup, which means losing control over
>> VCPU thread - qemu hangs.
>>
>> I was able to reproduce the issue very reliably with virtio-block
>> enabled, 32-bit qemu userspace on a 64-bit host, using a 32-bit WinXP
>> guest.
> 
> Jan, I tried to hunt down (well, FSVO anyway, since I don't understand
> qemu code as a whole still) this very issue since some 0.15 (IIRC -
> when coroutines were introduced) version.  The sympthom I faced was
> 32bit kvm process lockup when rebooting windows guest.  The cause
> was lost/ignored interrupts, and for me it was possible to just
> suspend/resume (SIGSTOP/SIGCONT) the kvm process or to attach a
> debugger or strace to it.  It looked like a corruption somewhere,
> and while bisecting I were finding "unrelated" commits -- like,
> eg, "switch qcow2 to coroutines" (I was using -snapshot, so qcow2
> was actually in use, but the commit itself were innocent).  There
> are several discussions in archives, debian bugreport about it and
> several IRC discussions, all with no outcome.  So at least now I
> can say that it is not only me who see the issue, so it passes a
> reality check somehow... ;)
> 
> But the thing is: generally, almost no one cares about 32/64bit
> "mixed" environment anymore.  I had a few users in Debian who
> complained, and it has always been the same scenario: an old 32bit
> install moved to a new hardware, next due to large amount of
> memory, switch to 64bit kernel, and the result is "something
> not working".  My suggestion to them has always been "reinstall".
> I use such a mixed environment myself on my development box
> (and actually even on production machines @office), so I'm
> one of the first to face issues in this area, and it sometimes
> does not let me to do other things -- eg, I can't debug some
> other bug because qemu locks up due to this 32/64 thing.  I
> learned to use a 64bit chroot for this things after all.
> 
> So I'm not sure if there's enough interest to hunt this.  It
> must be something very simple, and it might pop up somewhere
> else, but so far it - seemingly - only affects 32/64bit mixed
> environment.

This issue also affects 32/32 installations.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



reply via email to

[Prev in Thread] Current Thread [Next in Thread]