- Understanding C11 and C++11 Atomics
- New Atomic Types
- Memory Barriers
- C++11 and C11 Memory Orderings
- Acquire-and-Release Barriers
- Building Blocks
C++11 and C11 Memory Orderings
The new standards provide a number of memory orderings. The simplest to understand is the strictest: Sequentially consistent is the default type for classic C-style operations on _Atomic() variables. For example:
_Atomic(int) a = 42; ... a++; // <- This is sequentially consistent
This ordering, abbreviated to seq_cst in the standard, means that every operation of this kind will happen in an order that is indistinguishable from the order in which it was written.
At the opposite extreme is relaxed ordering, which guarantees atomicity for the operation, but nothing else. For example, a++ may be compiled to a load, an increment, and a store instruction. This isn't atomic; two threads doing it at the same time could end up with both reading, incrementing, and then writing back the same value. An atomic version would ensure that, from the perspective of another thread, the read-modify-write sequence appeared to happen in a single step. The operation can happen in any order with respect to memory operations, however.
So how do you write something like this? In C11, you'll do it this way:
#include_Atomic(int) i; ... atomic_fetch_add(&i, 1, memory_order_relaxed);
This is atomic, but has the same reordering rules as volatile. These two are the memory orderings that you're most likely to use in your own code. If you only want atomic operations and really don't care about orderings (for example, for a simple monotonic counter), you should use memory_order_relaxed. Otherwise, start by using memory_order_seq_cst and then relax the constraints later.
The worst that can happen by specifying sequentially consistent is a performance penalty. The worst that can happen when you specify a more relaxed ordering than you meant to have is that your code is subtly wrong. Worst of all, if you test it on a strongly ordered architecture such as x86, it's likely to work — and then subtly fail when you port it to something like ARM.