lightning
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Atomic operations


From: Marc Nieper-Wißkirchen
Subject: Re: Atomic operations
Date: Fri, 12 Aug 2022 16:45:02 +0200

I agree that casr/tasr (maybe also casi/tasi) are the most important instructions.  As long as it can be implemented on the supported set of CPUs, I don't see why we shouldn't have _c, _s, _i, _l variants alongside the word-size variant.

The only problem I see is that an atomic store in one thread together with an atomic load of the same memory location in another thread will become costly if release-acquire semantics are needed.  This can be emulated with casr/tasr (I assume that their semantics is sequentially consistent) but this is a costly operation while on many important CPUs like x86 all release-acquire semantics actually do not need special instructions.

It is probably enough to provide a global release and an acquire instruction (which does the equivalent of C11's atomic_thread_fence (memory_order_release) and atomic_thread_fence (memory_order_acquire), respectively).  (These would be no-ops on x86, for example.)  And maybe also an operation corresponding to atomic_thread_fence (memory_order_seq_cst).

Am Fr., 12. Aug. 2022 um 13:16 Uhr schrieb Paulo César Pereira de Andrade <paulo.cesar.pereira.de.andrade@gmail.com>:
Em qui., 11 de ago. de 2022 às 17:52, Marc Nieper-Wißkirchen
<marc.nieper+gnu@gmail.com> escreveu:

  Hi Marc,

> PS: This document may be helpful as well: https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html.

  Right now, my idea of what should be done would be to have a
jit_casr(bool_return_in_register, address_in_register,
new_value_in_register, old_value_in_register);

  It would need a new pattern of 4 registers instruction, as only
the first one is modified. The current qmul* and qdiv* ones put
result in the first two register arguments.

  And while at that, also have a:

jit_tasr(old_value_in_register, address_in_register, new_value_in_register);

  Likely can have the _c, _s, _i, and _l modifiers, or have only a 32
bit variant.

  Very few cpus should not have some construct for it. For those, we just add
a software fallback with some kind of spin lock. In either case,  we "cheat" and
look at what assembly gcc generates or the kernel uses for the
equivalent construct.

  About the ABA problem, we ignore it. Just have some note, describe the issue
and how to avoid it. I would suggest using it with the address_in_register value
as a special lock, not the actual variable.

> Am Do., 11. Aug. 2022 um 21:35 Uhr schrieb Marc Nieper-Wißkirchen <marc.nieper+gnu@gmail.com>:
>>
>> Hi Paulo,
>>
>> Am Di., 9. Aug. 2022 um 12:40 Uhr schrieb Paulo César Pereira de Andrade <paulo.cesar.pereira.de.andrade@gmail.com>:
>>>
>>> > Here is a minimal API, albeit written for Scheme: https://srfi.schemers.org/srfi-230/srfi-230.html.  What is an atomic (fixnum) box there should be word-sized memory location in GNU lightning.  Atomic pairs (two words) are important for some algorithms.  If they are not easily implementable on a particular architecture, GNU lightning should report this so that the user can call C library routines (from stdatomic) or GCC builtins themselves.
>>> >
>>> > As for GNU lightning instructions, we would probably at least need the following instructions (for word-sized integers):
>>> >
>>> > - loads and stores with relaxed memory order (if I have understood correctly, we can use the usual GNU lightning load/store instructions)
>>> > - loads with acquire memory order
>>> > - stores with release memory order
>>> > - swap (load and store) with relaxed memory order
>>> > - swap (load and store) with acquire-release memory order
>>> > - compare-and-swap with relaxed memory order
>>> > - compare-and-swap with acquire-release memory order
>>>
>>>   If lightning were to provide such primites, I believe it should
>>> only "make a contract" of supporting strong compare-and-swap,
>>> not on shared memory (a different process might die with the
>>> lock held), to allow some kind of mutex implementation, what
>>> could be expensive if there are too many waiters spinning.
>>
>>
>> I am not sure whether I have understood your "contract".
>>
>> In any case, if a mutex is needed we could just call the GCC-provided software implementation in libatomic ([1]) (after checking that libatomic's ABI is supposed to be stable and works with different compilers as well).  Alternatively, we can roll out our own hash table of mutexes where the hash is calculated from the memory address that is to be accessed atomically in software.
>>
>>>
>>>   Still not trivial to get it on all supported ports, at least with the same
>>> semantics, because if need to implement in an external function call,
>>> it would need to save/restore all JT_R* and JIT_F* registers in the
>>> worst case. Most times could just inline what gcc generates.
>>
>>
>> As atomic operations are usually quite costly, the overhead of saving and restoring registers is probably not too bad.
>>
>>>
>>> > And the same, if supported, for double-word-sized memory operands.
>>> >
>>> > And then the following arithmetic operations (in relaxed and acquire-release semantics):
>>> >
>>> > - fetch-add
>>> > - fetch-sub
>>> > - fetch-or
>>> > - fetch-xor
>>> > - fetch-and
>>> >
>>> > And then an instruction to emit a memory order (release, acquire, acquire-release, sequential consistency) as atomic_thread_fence in stdatomic.h.
>>> > To simplify the interface, it may make sense to offer all operations (but the thread fence instruction) only with relaxed semantics so that the programmer has to emit thread fence instructions explicitly.
>>>
>>>   The simplest way to implement it is to have it have some PIC code
>>> implementing it, and use two jit_jmpr to/from the code, but lightning
>>> would still treat the jit_jmpr as function calls, that is, invalidate non callee
>>> save registers.
>>
>>
>> If internally, lightning uses an instruction different from jit_jmpr to "call" the atomic code, it can have more detailed knowledge about the registers that have to be saved.
>>
>>>
>>>
>>>   As long as using only jmpr, and not modifying registers, should be
>>> enough to call jit_live() once "returning" for any non callee save register
>>> used in the construct, or that must be alive for other use.
>>>
>>> Thanks!
>>> Paulo
>>
>>
>> --
>>
>> [1] https://gcc.gnu.org/git/?p=gcc.git;a=tree;f=libatomic;h=7e61d96034e3b2f3c697d30e9d88ef9482f047e5;hb=HEAD
>>

reply via email to

[Prev in Thread] Current Thread [Next in Thread]