Also, unless I'm misunderstanding something our implementation of LR/SC is
pretty broken. We're just using a CAS to check if the value changed, which
suffers from the ABA problem that LR/SC is there to fix in the first place. I
might be missing something here, though, as it looks like MIPS is doing
essentially the same thing.
All of our load-lock/store-conditional implementations do the same.
You are correct that we do not implement the exact correct semantics. Correct semantics, as far as I know, cannot be modeled without emulating everything in a serial context, including caches.
We therefore make a considered choice to observe that while ll/sc can be used to do all kinds of wild and woolly things, no one actually does so. What people actually do is write portable code, using c/c++ atomic primitives. And that all of these can be done with cmpxchg, because that's what x86 has.
Using that choice, we can provide multi-threaded smp emulation that, demonstrably, works.
r~