qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC v6 00/14] Slow-path for atomic instruction transla


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [RFC v6 00/14] Slow-path for atomic instruction translation
Date: Mon, 14 Dec 2015 11:17:53 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0


On 14/12/2015 11:04, alvise rigo wrote:
> In any case, what I proposed in the mttcg based v5 was:
> - A LL ensures that the TLB_EXCL flag is set on all the CPU's TLB.
> This is done by querying a TLB flush to all (not exactly all...) the
> CPUs. To be 100% safe, probably we should also wait that the flush is
> actually performed
> - A TLB_EXCL flag set always forces the slow-path, allowing the CPUs
> to check for possible collision with a "exclusive memory region"
> 
> Now, why the fact of querying the flush (and possibly ensuring that
> the flush has been actually done) should not be enough?

There will always be a race where the normal store fails.  While I
haven't studied your code enough to do a constructive proof, it's enough
to prove the impossibility of what you're trying to do.  Mind, I also
believed for a long time that it was possible to do it!

If we have two CPUs, with CPU 0 executing LL and the CPU 1 executing a
store, you can model this as a consensus problem.  For example, CPU 0
could propose that the subsequent SC succeeds, while CPU 1 proposes that
it fails.  The outcome of the SC instruction depends on who wins.

Therefore, implementing LL/SC problem requires---on both CPU 0 and CPU
1, and hence for both LL/SC and normal store---an atomic primitive with
consensus number >= 2.  Other than LL/SC itself, the commonly-available
operations satisfying this requirement are test-and-set (consensus
number 2) and compare-and-swap (infinite consensus number).  Normal
memory reads and writes (called "atomic registers" in multi-processing
research lingo) have consensus number 1; it's not enough.

If the host had LL/SC, CPU 1 could in principle delegate its side of the
consensus problem to the processor; but even that is not a solution
because processors constrain the instructions that can appear between
the load and the store, and this could cause an infinite sequence of
spurious failed SCs.  Another option is transactional memory, but it's
also too slow for normal stores.

The simplest solution is not to implement full LL/SC semantics; instead,
similar to linux-user, a SC operation can perform a cmpxchg from the
value fetched by LL to the argument of SC.  This bypasses the issue
because stores do not have to be instrumented at all, but it does mean
that the emulation suffers from the ABA problem.

TLB_EXCL is also a middle-ground, a little bit stronger than cmpxchg.
It's more complex and more accurate, but also not perfect.  Which is
okay, but has to be documented.

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]