qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH V7 00/19] Multithread TCG.


From: Alex Bennée
Subject: Re: [Qemu-devel] [RFC PATCH V7 00/19] Multithread TCG.
Date: Tue, 11 Aug 2015 08:54:15 +0100

Benjamin Herrenschmidt <address@hidden> writes:

> On Mon, 2015-08-10 at 17:26 +0200, address@hidden wrote:
>> From: KONRAD Frederic <address@hidden>
>> 
>> This is the 7th round of the MTTCG patch series.
>> 
>> 
>> It can be cloned from:
>> address@hidden:fkonrad/mttcg.git branch multi_tcg_v7.
>> 
>> This patch-set try to address the different issues in the global picture of
>> MTTCG, presented on the wiki.
>> 
>> == Needed patch for our work ==
>> 
>> Some preliminaries are needed for our work:
>>  * current_cpu doesn't make sense in mttcg so a tcg_executing flag is added 
>> to
>>    the CPUState.
>
> Can't you just make it a TLS ?
>
>>  * We need to run some work safely when all VCPUs are outside their execution
>>    loop. This is done with the async_run_safe_work_on_cpu function introduced
>>    in this series.
>>  * QemuSpin lock is introduced (on posix only yet) to allow a faster 
>> handling of
>>    atomic instruction.
>
> How do you handle the memory model ? IE , ARM and PPC are OO while x86
> is (mostly) in order, so emulating ARM/PPC on x86 is fine but emulating
> x86 on ARM or PPC will lead to problems unless you generate memory
> barriers with every load/store ..

This is the next chunk of work. We have Alvise's LL/SC patches which
allow us to do proper emulation of ARMs Load/store exclusive behaviour
and any weak order target will have to use such constructs.

Currently the plan is to introduce a barrier TCG op which will translate
to the strongest backend barrier available. Even x86 should be using
barriers to ensure cross-core visibility which then leaves LS
re-ordering on the same core.

> At least on POWER7 and later on PPC we have the possibility of setting
> the attribute "Strong Access Ordering" with mremap/mprotect (I dont'
> remember which one) which gives us x86-like memory semantics...
>
> I don't know if ARM supports something similar. On the other hand, when
> emulating ARM on PPC or vice-versa, we can probably get away with no
> barriers.
>
> Do you expose some kind of guest memory model info to the TCG backend so
> it can decide how to handle these things ?
>
>> == Code generation and cache ==
>> 
>> As Qemu stands, there is no protection at all against two threads attempting 
>> to
>> generate code at the same time or modifying a TranslationBlock.
>> The "protect TBContext with tb_lock" patch address the issue of code 
>> generation
>> and makes all the tb_* function thread safe (except tb_flush).
>> This raised the question of one or multiple caches. We choosed to use one
>> unified cache because it's easier as a first step and since the structure of
>> QEMU effectively has a ‘local’ cache per CPU in the form of the jump cache, 
>> we
>> don't see the benefit of having two pools of tbs.
>> 
>> == Dirty tracking ==
>> 
>> Protecting the IOs:
>> To allows all VCPUs threads to run at the same time we need to drop the
>> global_mutex as soon as possible. The io access need to take the mutex. This 
>> is
>> likely to change when 
>> http://thread.gmane.org/gmane.comp.emulators.qemu/345258
>> will be upstreamed.
>> 
>> Invalidation of TranslationBlocks:
>> We can have all VCPUs running during an invalidation. Each VCPU is able to 
>> clean
>> it's jump cache itself as it is in CPUState so that can be handled by a 
>> simple
>> call to async_run_on_cpu. However tb_invalidate also writes to the
>> TranslationBlock which is shared as we have only one pool.
>> Hence this part of invalidate requires all VCPUs to exit before it can be 
>> done.
>> Hence the async_run_safe_work_on_cpu is introduced to handle this case.
>
> What about the host MMU emulation ? Is that multithreaded ? It has
> potential issues when doing things like dirty bit updates into guest
> memory, those need to be done atomically. Also TLB invalidations on ARM
> and PPC are global, so they will need to invalidate the remote SW TLBs
> as well.
>
> Do you have a mechanism to synchronize with another thread ? IE, make it
> pop out of TCG if already in and prevent it from getting in ? That way
> you can "remotely" invalidate its TLB...
>
>> == Atomic instruction ==
>> 
>> For now only ARM on x64 is supported by using an cmpxchg instruction.
>> Specifically the limitation of this approach is that it is harder to support
>> 64bit ARM on a host architecture that is multi-core, but only supports 32 bit
>> cmpxchg (we believe this could be the case for some PPC cores).
>
> Right, on the other hand 64-bit will do fine. But then x86 has 2-value
> atomics nowadays, doesn't it ? And that will be hard to emulate on
> anything. You might need to have some kind of global hashed lock list
> used by atomics (hash the physical address) as a fallback if you don't
> have a 1:1 match between host and guest capabilities.
>
> Cheers,
> Ben.

-- 
Alex Bennée



reply via email to

[Prev in Thread] Current Thread [Next in Thread]