[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*.
From: |
Mark Burton |
Subject: |
Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*. |
Date: |
Thu, 18 Dec 2014 15:20:45 +0100 |
> On 18/12/2014 13:24, Alexander Graf wrote:
>> That's the nice thing about transactions - they guarantee that no other
>> CPU accesses the same cache line at the same time. So you're safe
>> against other vcpus even without blocking them manually.
>>
>> For the non-transactional implementation we probably would need an "IPI
>> others and halt them until we're done with the critical section"
>> approach. But I really wouldn't concentrate on making things fast on old
>> CPUs.
>
> The non-transactional implementation can use softmmu to trap access to
> the page from other VCPUs. This makes it possible to implement (at the
> cost of speed) the same semantics on all hosts.
>
> Paolo
I believe what your describing, using transactional memory, or using softmmu
amounts to either option 3 below or option 4.
Relying on it totally was option 4.
Seems to me, the problem with that option is that support for some hosts will
be a pain, and covering everything will take some time :-(
Option 3 suggests that we build a ‘slow path’ mechanism first - make sure that
works (as a backup), and then add optimisations for specific hosts/guests
afterwards. To me that still seems preferable?
Cheers
Mark.
> On 18 Dec 2014, at 13:24, Alexander Graf <address@hidden> wrote:
>
>
>
> On 18.12.14 10:12, Mark Burton wrote:
>>
>>> On 17 Dec 2014, at 17:39, Peter Maydell <address@hidden> wrote:
>>>
>>> On 17 December 2014 at 16:29, Mark Burton <address@hidden> wrote:
>>>>> On 17 Dec 2014, at 17:27, Peter Maydell <address@hidden> wrote:
>>>>> I think a mutex is fine, personally -- I just don't want
>>>>> to see fifteen hand-hacked mutexes in the target-* code.
>>>>>
>>>>
>>>> Which would seem to favour the helper function approach?
>>>> Or am I missing something?
>>>
>>> You need at least some support from QEMU core -- consider
>>> what happens with this patch if the ldrex takes a data
>>> abort, for instance.
>>>
>>> And if you need the "stop all other CPUs while I do this”
>>
>> It looks like a corner case, but working this through - the ’simple’ put a
>> mutex around the atomic instructions approach would indeed need to ensure
>> that no other core was doing anything - that just happens to be true for
>> qemu today (or - we would have to put a mutex around all writes); in order
>> to ensure the case where a store exclusive could potential fail if a
>> non-atomic instruction wrote (a different value) to the same address. This
>> is currently guarantee by the implementation in Qemu - how useful it is I
>> dont know, but if we break it, we run the risk that something will fail (at
>> the least, we could not claim to have kept things the same).
>>
>> This also has implications for the idea of adding TCG ops I think...
>> The ideal scenario is that we could ‘fallback’ on the same semantics that
>> are there today - allowing specific target/host combinations to be optimised
>> (and to improve their functionality).
>> But that means, from within the TCG Op, we would need to have a mechanism,
>> to cause other TCG’s to take an exit…. etc etc… In the end, I’m sure it’s
>> possible, but it feels so awkward.
>
> That's the nice thing about transactions - they guarantee that no other
> CPU accesses the same cache line at the same time. So you're safe
> against other vcpus even without blocking them manually.
>
> For the non-transactional implementation we probably would need an "IPI
> others and halt them until we're done with the critical section"
> approach. But I really wouldn't concentrate on making things fast on old
> CPUs.
>
> Also keep in mind that for the UP case we can always omit all the magic
> - we only need to detect when we move into an SMP case (linux-user clone
> or -smp on system).
>
>>
>> To re-cap where we are (for my own benefit if nobody else):
>> We have several propositions in terms of implementing Atomic instructions
>>
>> 1/ We wrap the atomic instructions in a mutex using helper functions (this
>> is the approach others have taken, it’s simple, but it is not clean, as
>> stated above).
>
> This is horrible. Imagine you have this split approach with a load
> exclusive and then store whereas the load starts mutex usage and the
> store stop is. At that point if the store creates a segfault you'll be
> left with a dangling mutex.
>
> This stuff really belongs into the TCG core.
>
>>
>> 1.5/ We add a mechanism to ensure that when the mutex is taken, all other
>> cores are ‘stopped’.
>>
>> 2/ We add some TCG ops to effectively do the same thing, but this would give
>> us the benefit of being able to provide better implementations. This is
>> attractive, but we would end up needing ops to cover at least exclusive
>> load/store and atomic compare exchange. To me this looks less than elegant
>> (being pulled close to the target, rather than being able to generalise),
>> but it’s not clear how we would implement the operations as we would like,
>> with a machine instruction, unless we did split them out along these lines.
>> This approach also (probably) requires the 1.5 mechanism above.
>
> I'm still in favor of just forcing the semantics of transactions onto
> this. If the host doesn't implement transactions, tough luck - do the
> "halt all others" IPI.
>
>>
>> 3/ We have discussed a ‘h/w’ approach to the problem. In this case, all
>> atomic instructions are forced to take the slow path - and a additional
>> flags are added to the memory API. We then deal with the issue closer to the
>> memory where we can record who has a lock on a memory address. For this to
>> work - we would also either
>> a) need to add a mprotect type approach to ensure no ‘non atomic’ writes
>> occur - or
>> b) need to force all cores to mark the page with the exclusive memory as IO
>> or similar to ensure that all write accesses followed the slow path.
>>
>> 4/ There is an option to implement exclusive operations within the TCG using
>> mprotect (and signal handlers). I have some concerns on this : would we need
>> have to have support for each host O/S…. I also think we might end up the a
>> lot of protected regions causing a lot of SIGSEGV’s because an errant guest
>> doesn’t behave well - basically we will need to see the impact on
>> performance - finally - this will be really painful to deal with for cases
>> where the exclusive memory is held in what Qemu considers IO space !!!
>> In other words - putting the mprotect inside TCG looks to me like it’s
>> mutually exclusive to supporting a memory-based scheme like (3).
>
> Again, I don't think it's worth caring about legacy host systems too
> much. In a few years from now transactional memory will be commodity,
> just like KVM is today.
>
>
> Alex
>
>> My personal preference is for 3b) it is “safe” - its where the hardware is.
>> 3a is an optimization of that.
>> to me, (2) is an optimisation again. We are effectively saying, if you are
>> able to do this directly, then you dont need to pass via the slow path.
>> Otherwise, you always have the option of reverting to the slow path.
>>
>> Frankly - 1 and 1.5 are hacks - they are not optimisations, they are just
>> dirty hacks. However - their saving grace is that they are hacks that exist
>> and “work”. I dislike patching the hack, but it did seem to offer the
>> fastest solution to get around this problem - at least for now. I am no
>> longer convinced.
>>
>> 4/ is something I’d like other peoples views on too… Is it a better
>> approach? What about the slow path?
>>
>> I increasingly begin to feel that we should really approach this from the
>> other end, and provide a ‘correct’ solution using the memory - then worry
>> about making that faster…
>>
>> Cheers
>>
>> Mark.
>>
>>
>>
>>
>>
>>
>>
>>
>>> semantics linux-user currently uses then that definitely needs
>>> core code support. (Maybe linux-user is being over-zealous
>>> there; I haven't thought about it.)
>>>
>>> -- PMM
>>
>>
>> +44 (0)20 7100 3485 x 210
>> +33 (0)5 33 52 01 77x 210
>>
>> +33 (0)603762104
>> mark.burton
>>
+44 (0)20 7100 3485 x 210
+33 (0)5 33 52 01 77x 210
+33 (0)603762104
mark.burton
- Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*., (continued)
- Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*., Mark Burton, 2014/12/17
- Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*., Peter Maydell, 2014/12/17
- Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*., Mark Burton, 2014/12/17
- Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*., Peter Maydell, 2014/12/17
- Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*., Peter Maydell, 2014/12/17
- Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*., Mark Burton, 2014/12/18
- Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*., Alexander Graf, 2014/12/18
- Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*., Dr. David Alan Gilbert, 2014/12/18
- Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*., Paolo Bonzini, 2014/12/18
- Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*., Mark Burton, 2014/12/18
- Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*.,
Mark Burton <=
- Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*., Alexander Graf, 2014/12/18
- Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*., Mark Burton, 2014/12/18
- Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*., Alexander Graf, 2014/12/18
- Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*., Mark Burton, 2014/12/18
- Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*., Paolo Bonzini, 2014/12/18
- Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*., Mark Burton, 2014/12/17
- Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*., Dr. David Alan Gilbert, 2014/12/17
- Re: [Qemu-devel] [RFC PATCH] target-arm: protect cpu_exclusive_*., Peter Maydell, 2014/12/17