qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC 01/10] target-arm: protect cpu_exclusive_*.


From: Mark Burton
Subject: Re: [Qemu-devel] [RFC 01/10] target-arm: protect cpu_exclusive_*.
Date: Tue, 3 Mar 2015 16:29:13 +0100

Hi Peter, thanks for the feedback

So - what we have discovered - using the slow path as discussed has a 
significant performance hit (no surprise), likewise the user-mode exit 
mechanism. However we are investigating that in parallel...

However, for now, we have left the TCG doing the majority of the work. 
(This was supposed to be the ‘quick an easy’ approach - we have already 
discussed better approaches to atomic instructions at length - but they are 
even more invasive!)

we have tried several schemes, and not found anything totally satisfactory:  we 
have a gremlin. There seems to be a race condition somewhere effectively means 
that the guest ‘ticket’ mechanism inside the guest kernel goes off by one, and 
therefore the guest kernel stalls as it never gets the ticket it’s looking 
for…. (it’s off by one in the ‘one too few’ sense, both CPU’s end up looking 
for tickets that are higher than the current finished ticket)...

The mechanism to hand out the tickets is of course a LDREX/STREX, leading us to 
believe that is the cause of our problems, however I am increasingly sceptical.


Our scheme is actually quite simple now:
We keep the same overall scheme as exits upstream today - we store the address 
and value.
We provide a lock which is used for LDREX, STREX, CLREX, and in the 
raise_exception code.
For LDREX we check that no other CPU is tagging the address. Because of the 
lock, we can be sure no STREX is executing so we are always safe to ‘steal’ a 
tag from another CPU (which is what the ARM ARM defines).
STREX and CLREX are similar to upstream...
We are also careful to invalidate the address tag if we take an exception.

The drawback of this scheme is of course that we are not totally protected 
against non exclusive writes, which could potentially fall between our reading 
a value and checking it against our saved value. But I am not convinced that is 
our problem (I have checked to make sure we dont get non exclusive writes).

This scheme would seem to be elegant and simple - but it suffers from the one 
small drawback of still having the issue as above…


Cheers

Mark.

ps. on our bug - we believe somehow the STREX is being marked as failed, but 
actually succeeds to write something.  There are only 3 ways the strex can fail:
1/ the address doesn't match - in which case no ST will be attempted
2/ the value doesn't match - which means - no ST attempted
3/ the store starts, but causes a TLB fill/exception…

The 3rd has 2 possibilities - the TLB is filled, and the store goes ahead 
totally normally - there should be no ‘fail’ - or an exception is generated in 
which case we will long jump away and never return. 

Am I missing something?



>> 
>> 


> On 2 Mar 2015, at 13:27, Peter Maydell <address@hidden> wrote:
> 
> On 27 February 2015 at 16:54, Mark Burton <address@hidden> wrote:
>> 
>>> On 26 Feb 2015, at 23:56, Peter Maydell <address@hidden> wrote:
>>> cpu_physical_memory_rw would bypass the TLB and so be much slower.
>>> Make sure you use the functions which go via the TLB if you do
>>> this in a helper (and remember that they will longjmp out on a
>>> tlb miss!)
>> 
>> At this point speed isn’t our main concern - it’s simplicity of 
>> implementation - we want it to work, then we can worry about a better 
>> implementation (which likely should not go this path at all - as discussed 
>> above).
>> Given that - isn’t it reasonable to pass through cpu_physical_memory_rw - 
>> and hence not have to worry about the long jump ? Or am I missing something?
> 
> If you use cpu_physical_memory_rw you need to do the
> virt-to-phys translation by hand (preferably via the TLB).
> That might be something you needed to do anyway if we want
> to have architecturally correct monitors that work on
> physaddrs rather than vaddrs, but if not then the two
> step process is a bit awkward.
> 
>>> Pretty sure we've already discussed how the current ldrex/strex
>>> implementation is not architecturally correct. I think this is
>>> another of those areas.
>> 
>> We have indeed discussed this - but this is a surprise.
> 
> You're right that I didn't specifically realise this exact
> part of our incorrectness earlier.
> 
> -- PMM


         +44 (0)20 7100 3485 x 210
 +33 (0)5 33 52 01 77x 210

        +33 (0)603762104
        mark.burton




reply via email to

[Prev in Thread] Current Thread [Next in Thread]