[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] Holding the BQL for emulate_ppc_hypercall
From: |
Alex Bennée |
Subject: |
Re: [Qemu-ppc] Holding the BQL for emulate_ppc_hypercall |
Date: |
Tue, 25 Oct 2016 09:36:04 +0100 |
User-agent: |
mu4e 0.9.17; emacs 25.1.50.10 |
David Gibson <address@hidden> writes:
> On Mon, Oct 24, 2016 at 03:44:01PM +0100, Alex Bennée wrote:
>>
>> Alex Bennée <address@hidden> writes:
>>
>> > Hi,
>> >
>> > In the MTTCG patch set one of the big patches is to remove the
>> > requirement to hold the BQL while running code:
>> >
>> > tcg: drop global lock during TCG code execution
>> >
>> > And this broke the PPC code because emulate_ppc_hypercall can cause
>> > changes to the global state. This function just calls spapr_hypercall()
>> > and puts the results into the TCG register file. Normally
>> > spapr_hypercall() is called under the BQL in KVM as
>> > kvm_arch_handle_exit() does things with the BQL held.
>> >
>> > I blithely wrapped the called in a lock/unlock pair only to find the
>> > ppc64 check builds failed as the hypercall was made during the
>> > cc->do_interrupt() code which also holds the BQL.
>> >
>> > I'm a little confused by the nature of PPC hypercalls in TCG? Are they
>> > not all detectable at code generation time? What is the case that causes
>> > an exception to occur rather than the helper function doing the
>> > hypercall?
>> >
>> > I guess it comes down to can I avoid doing:
>> >
>> > /* If we come via cc->do_interrupt BQL may already be held */
>> > if (!qemu_mutex_iothread_locked()) {
>> > g_mutex_lock_iothread();
>> > env->gpr[3] = spapr_hypercall(cpu, env->gpr[3], &env->gpr[4]);
>> > g_muetx_unlock_iothread();
>> > } else {
>> > env->gpr[3] = spapr_hypercall(cpu, env->gpr[3], &env->gpr[4]);
>> > }
>>
>> Of course I mean:
>>
>> /* If we come via cc->do_interrupt BQL may already be held */
>> if (!qemu_mutex_iothread_locked()) {
>> qemu_mutex_lock_iothread();
>> env->gpr[3] = spapr_hypercall(cpu, env->gpr[3], &env->gpr[4]);
>> qemu_mutex_unlock_iothread();
>> } else {
>> env->gpr[3] = spapr_hypercall(cpu, env->gpr[3], &env->gpr[4]);
>> }
>>
>> > Any thoughts?
>
> So, I understand why the hypercall is being called from exception code
> and therefore with the BQL held. On Power, the hypercall instruction
> is the same as the guest-level system call instruction, just with a
> flag bit set. System calls are, of course, treated as exceptions,
> because they change the CPU's privilege mode. Likewise if we were
> implementing a full host system (like the upcoming 'powernv' machine
> type) we'd need to treat hypercalls as exceptions for the same reason.
>
> We could detect hypercalls at translation time, but at present we
> don't: we go into the exception path, then detect that it's a "level
> 1" (i.e. hypervisor) sc instruction and branch off to the hypercall
> emulation code if that's been set up. It just seemed the simplet
> approach at the time.
>
> What I *don't* understand is how the hypercall code is ever being
> invoked *without* the BQL. I grepped through and the only entry paths
> I can see are the one in the exception handling and KVM.
Ahh I think I have figured out what happened.
In the pre-MTTCG world TCG code always held the BQL. After Jan's patch
the BQL was held for interrupt processing but not exception handling.
I've since fixed this. I think I must have tried this hack before fixing
the root cause because I had another assertion failure when trying to
ensure all IRQ processing has the BQL held (as the target specific code
can do all sorts of things on an IRQ/Exception).
> Could you try to get a backtrace from the case where we're entering
> the hypercall without the BQL?
I'll remove the hack, add an assert and see if it comes up again in
testing.
--
Alex Bennée