qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Help on TLB Flush


From: Mark Burton
Subject: Re: [Qemu-devel] Help on TLB Flush
Date: Fri, 13 Feb 2015 08:16:04 +0100

Up top - thanks Peter, I think you may give us an idea !

> On 12 Feb 2015, at 23:10, Lluís Vilanova <address@hidden> wrote:
> 
> Mark Burton writes:
> 
>>> On 12 Feb 2015, at 16:38, Alexander Graf <address@hidden> wrote:
>>> 
>>> 
>>> 
>>> On 12.02.15 15:58, Peter Maydell wrote:
>>>> On 12 February 2015 at 14:45, Alexander Graf <address@hidden> wrote:
>>>>> almost nobody except x86 does global flushes
>>>> 
>>>> All ARM TLB maintenance operations have both "this CPU only"
>>>> and "all TLBs in the Inner Shareable domain" [that's ARM-speak
>>>> for "every CPU core in the cluster"] variants (the latter
>>>> being the TLB *IS operations). Looking at Linux's
>>>> arch/arm64/mm/tlb.S and arch/arm64/include/asm/tlbflush.h
>>>> most of the operations defined there use the IS variants.
>>> 
>>> Wow, did anyone benchmark this? I know that PPC switched away from
>>> global flushes and instead tracks the CPUs a task was running on to
>>> limit the scope of CPUs that need to flush.
> 
>> Doesn’t that mean you have to signal a specific CPU to cause it to flush 
>> itself…. Isn’t that in itself expensive? Do you have to organise some sort 
>> of atomicity yourself around that too?
> 
> Yup. AFAIR, Linux in x86-64 queues a request to a per-CPU request list, and 
> uses
> IPIs to signal these types of operations to the target CPU:
> 
>  http://lxr.free-electrons.com/source/kernel/smp.c?v=2.6.32#L386
> 
> Waiting for completion is implemented on top by incrementing some counter from
> each CPU, and waiting for it to have the correct final value.

If the kernel is doing this - then effectively - for X86, each CPU only flush’s 
it’s own TLB (from the perspective of Qemu) - correct?
(in which case, for Qemu itself - for x86) - we dont need to implement a global 
flush, and hence we dont need to build the mechanism to sync ?

If I understand correctly then - the processor that causes some pain is the ARM 
that has (and uses) global flush, but the mitigating factors is that those 
flushes can by asyncronous so long as they complete before a memory barrier….

Cheers

Mark.


> 
> If something were implemented on these lines, it could be used as a generic
> cross-CPU event messaging infrastructure (plus some interrupt bit in the CPU
> structure that TCG would check to break away from guest code; I believe
> something similar is already being used - icount? -).
> 
> PS: To be honest, I still don't know which TLBs we're talking about here, and
>    which cases trigger these TLB flush operations.
> 
> 
> Cheers,
>  Lluis
> 
> -- 
> "And it's much the same thing with knowledge, for whenever you learn
> something new, the whole world becomes that much richer."
> -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
> Tollbooth


         +44 (0)20 7100 3485 x 210
 +33 (0)5 33 52 01 77x 210

        +33 (0)603762104
        mark.burton




reply via email to

[Prev in Thread] Current Thread [Next in Thread]