Re: [Qemu-devel] qemu-system-ppc hangs

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] qemu-system-ppc hangs

From:	Richard Purdie
Subject:	Re: [Qemu-devel] qemu-system-ppc hangs
Date:	Tue, 21 Nov 2017 17:33:37 +0000

On Tue, 2017-11-21 at 09:55 +0000, Alex Bennée wrote:
> Richard Purdie <address@hidden> writes:
> > At this point I therefore wanted to seek advice on what the real
> > issue
> > is here and how to fix it!
> Code should be using cpu_interrupt() to change things but we have
> plenty
> of examples in the code of messing with cpu->interrupt_request
> directly
> which is often why the assert() in cpu_interrupt() doesn't get a
> chance
> to fire.
> 
> The two most prolific direct users seems to be i386 and ppc.
> 
> The i386 cases are all most likely OK as it tends to be in KVM
> emulation
> code where by definition the BQL is already held by the time you get
> there. For PPC it will depend on how you got there. The
> multi-thread-tcg.txt document describes the approach:
> 
>   Emulated hardware state
>   -----------------------
> 
>   Currently thanks to KVM work any access to IO memory is
> automatically
>   protected by the global iothread mutex, also known as the BQL (Big
>   Qemu Lock). Any IO region that doesn't use global mutex is expected
> to
>   do its own locking.
> 
>   However IO memory isn't the only way emulated hardware state can be
>   modified. Some architectures have model specific registers that
>   trigger hardware emulation features. Generally any translation
> helper
>   that needs to update more than a single vCPUs of state should take
> the
>   BQL.
> 
>   As the BQL, or global iothread mutex is shared across the system we
>   push the use of the lock as far down into the TCG code as possible
> to
>   minimise contention.
> 
>   (Current solution)
> 
>   MMIO access automatically serialises hardware emulation by way of
> the
>   BQL. Currently ARM targets serialise all ARM_CP_IO register
> accesses
>   and also defer the reset/startup of vCPUs to the vCPU context by
> way
>   of async_run_on_cpu().
> 
>   Updates to interrupt state are also protected by the BQL as they
> can
>   often be cross vCPU.
> 
> So basically it comes down to the call-path into your final
> cpu_interrupt() call which is why I guess your doing the:
> 
>   if (!qemu_mutex_iothread_locked()) {
>      qemu_mutex_lock_iothread();
>      cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
>      qemu_mutex_unlock_iothread();
>   }
> 
> dance. I suspect the helper functions are called both from device
> emulation (where BQL is held) and from vCPU helpers (where it is
> currently not).
> 
> So I suggest the fix is:
> 
>  1. Convert sites doing their own manipulation of
>  cpu->interrupt_request() to call the helper instead
>  2. If helper directly called from TCG code:
>       - take BQL before calling cpu_interrupt(), release after
>     Else if helper shared between MMIO/TCG paths
>       - take BQL from TCG path, call helper, release after
> 
> It might be there is some sensible re-factoring that could be done to
> make things clearer but I'll leave that to the PPC experts to weigh
> in
> on.
> 
> Hope that helps!

It does indeed, thanks. I've sent out a proposed patch which does the
above so people can review it and see if its the right thing to do.
Certainly its working fine locally.

Cheers,

Richard

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] qemu-system-ppc hangs, Richard Purdie, 2017/11/20
- Re: [Qemu-devel] qemu-system-ppc hangs, Mark Cave-Ayland, 2017/11/21
  - Re: [Qemu-devel] qemu-system-ppc hangs, Richard Purdie, 2017/11/21
- Re: [Qemu-devel] qemu-system-ppc hangs, Alex Bennée, 2017/11/21
  - Re: [Qemu-devel] qemu-system-ppc hangs, Richard Purdie <=

Prev by Date: [Qemu-devel] [PATCH] target/ppc: Fix system lockups caused by interrupt_request state corruption
Next by Date: Re: [Qemu-devel] [PATCH v2 1/7] s390x/pci: factor out endianess conversion
Previous by thread: Re: [Qemu-devel] qemu-system-ppc hangs
Next by thread: [Qemu-devel] [PATCH v2 for-2.11 0/4] Fix segfault in blockjob race condition
Index(es):
- Date
- Thread