std::atomic from source to assemble---from gcc's perspective
intro
c++11 introduces a lot of interesting features including atomic. Thinking about it, you may find std::atomic is actually a machine-dependent feature: different architectures implement it in their own ways, especially for RISC such as ARM and CISC such as x86. Even more, there may be some architectures which don't support it at all.
So how does compilers deal with it?
the += operator override
In std library, the atomic += operator is overridden as the following function:
__int_type
operator+=(__int_type __i) noexcept
{ return __atomic_add_fetch(&_M_i, __i, int(memory_order_seq_cst)); }
The obvious double underscore indicates that the function could be a reserved word for compiler, as states in [C++ reference]( https://en.cppreference.com/c/language/identifier#:~:text=Note%3A in C%2B%2B%2C identifiers ,a%20double%20underscore%20are%20reserved.):
Note: in C++, identifiers with a double underscore anywhere are reserved everywhere; in C, only the ones that begin with a double underscore are reserved.
identifier recognition
The __atomic_add_fetch is defined in sync-builtins.def which is included by builtins.def
DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_ADD_FETCH_N,
"__atomic_add_fetch",
BT_FN_VOID_VAR, ATTR_NOTHROWCALL_LEAF_LIST)
In the c_define_builtins() function, all builtin functions are registered by name, which is "__atomic_add_fetch" for our analysis, making the lexer can recognize it.
/* Build builtin functions common to both C and C++ language
frontends. */
static void
c_define_builtins (tree va_list_ref_type_node, tree va_list_arg_type_node)
{
///...
c_init_attributes ();
#define DEF_BUILTIN(ENUM, NAME, CLASS, TYPE, LIBTYPE, BOTH_P, FALLBACK_P, \
NONANSI_P, ATTRS, IMPLICIT, COND) \
if (NAME && COND) \
def_builtin_1 (ENUM, NAME, CLASS, \
builtin_types[(int) TYPE], \
builtin_types[(int) LIBTYPE], \
BOTH_P, FALLBACK_P, NONANSI_P, \
built_in_attributes[(int) ATTRS], IMPLICIT);
#include "builtins.def"
targetm.init_builtins ();
///...
}
implicit function code conversion
Surprisingly, there is no handler for BUILT_IN_ATOMIC_ADD_FETCH_N in the expand_builtin() which is supposed to expand the it. After some debugging, I find out that the resolve_overloaded_builtin() called in finish_call_expr() convert the function code in a very clandestine way: it adds a exact_log2 (n) to the orig_code, meaning the __atomic_add_fetch, __atomic_add_fetch_1, __atomic_add_fetch_2, __atomic_add_fetch_4, __atomic_add_fetch_8, __atomic_add_fetch_16 must be placed sequentially to keep their enum values can be calculated via the expression above, which they are in sync-builtins.def.
/* Some builtin functions are placeholders for other expressions. This
function should be called immediately after parsing the call expression
before surrounding code has committed to the type of the expression.
LOC is the location of the builtin call.
FUNCTION is the DECL that has been invoked; it is known to be a builtin.
PARAMS is the argument list for the call. The return value is non-null
when expansion is complete, and null if normal processing should
continue. */
tree
resolve_overloaded_builtin (location_t loc, tree function,
vec<tree, va_gc> *params, bool complain)
{
///...
fncode = (enum built_in_function)((int)orig_code + exact_log2 (n) + 1);
new_function = builtin_decl_explicit (fncode);
///...
}
This means __atomic_add_fetch((int*)i, 1, 1) and __atomic_add_fetch((long*)i, 1, 1) have BUILT_IN_ATOMIC_ADD_FETCH_4 and BUILT_IN_ATOMIC_ADD_FETCH_8 respectively, even they have the same function name.
convert to rtx
The procedure can be easily found for BUILT_IN_ATOMIC_ADD_FETCH_8 and it turns out to be quick straightforward: expand_builtin=>expand_builtin_atomic_fetch_op=>expand_atomic_fetch_op>expand_atomic_fetch_op_no_fallback>get_atomic_op_for_code. In the `` case, the function tables are initialized in the case PLUS branch:
/* Fill in structure pointed to by OP with the various optab entries for an
operation of type CODE. */
static void
get_atomic_op_for_code (struct atomic_op_functions *op, enum rtx_code code)
{
gcc_assert (op!= NULL);
/* If SWITCHABLE_TARGET is defined, then subtargets can be switched
in the source code during compilation, and the optab entries are not
computable until runtime. Fill in the values at runtime. */
switch (code)
{
case PLUS:
op->mem_fetch_before = atomic_fetch_add_optab;
op->mem_fetch_after = atomic_add_fetch_optab;
op->mem_no_result = atomic_add_optab;
op->fetch_before = sync_old_add_optab;
op->fetch_after = sync_new_add_optab;
op->no_result = sync_add_optab;
op->reverse_code = MINUS;
break;
The corresponding op entry for atomic_fetch_add_optab is in optabs.def and, more over, the machine code for this operation is configured in the second parameter.
OPTAB_D (atomic_exchange_optab, "atomic_exchange$I$a")
OPTAB_D (atomic_fetch_add_optab, "atomic_fetch_add$I$a")
OPTAB_D (atomic_fetch_and_optab, "atomic_fetch_and$I$a")
machine instruction
i386
The machine code is described in sync.md
;; For operand 2 nonmemory_operand predicate is used instead of
;; register_operand to allow combiner to better optimize atomic
;; additions of constants.
(define_insn "atomic_fetch_add<mode>"
[(set (match_operand:SWI 0 "register_operand" "=<r>")
(unspec_volatile:SWI
[(match_operand:SWI 1 "memory_operand" "+m")
(match_operand:SI 3 "const_int_operand")] ;; model
UNSPECV_XCHG))
(set (match_dup 1)
(plus:SWI (match_dup 1)
(match_operand:SWI 2 "nonmemory_operand" "0")))
(clobber (reg:CC FLAGS_REG))]
"TARGET_XADD"
"lock{%;} %K3xadd{<imodesuffix>}\t{%0, %1|%1, %0}")
arm
There is no such instructions in arm architecture, so the expand_atomic_fetch_op takes a fallback. In the arm's specific case, the classic compare and swap loop is adopted, or
/* This function expands an atomic fetch_OP or OP_fetch operation:
TARGET is an option place to stick the return value. const0_rtx indicates
the result is unused.
atomically fetch MEM, perform the operation with VAL and return it to MEM.
CODE is the operation being performed (OP)
MEMMODEL is the memory model variant to use.
AFTER is true to return the result of the operation (OP_fetch).
AFTER is false to return the value before the operation (fetch_OP). */
rtx
expand_atomic_fetch_op (rtx target, rtx mem, rtx val, enum rtx_code code,
enum memmodel model, bool after)
{
///...
/* If nothing else has succeeded, default to a compare and swap loop. */
if (can_compare_and_swap_p (mode, true))
{
rtx_insn *insn;
rtx t0 = gen_reg_rtx (mode), t1;
start_sequence ();
/* If the result is used, get a register for it. */
if (!unused_result)
{
if (!target || !register_operand (target, mode))
target = gen_reg_rtx (mode);
/* If fetch_before, copy the value now. */
if (!after)
emit_move_insn (target, t0);
}
else
target = const0_rtx;
t1 = t0;
if (code == NOT)
{
t1 = expand_simple_binop (mode, AND, t1, val, NULL_RTX,
true, OPTAB_LIB_WIDEN);
t1 = expand_simple_unop (mode, code, t1, NULL_RTX, true);
}
else
t1 = expand_simple_binop (mode, code, t1, val, NULL_RTX, true,
OPTAB_LIB_WIDEN);
/* For after, copy the value now. */
if (!unused_result && after)
emit_move_insn (target, t1);
insn = end_sequence ();
if (t1 != NULL && expand_compare_and_swap_loop (mem, t0, t1, insn))
return target;
}
return NULL_RTX;
}
///...
}
verify
The test code is as follows:
tsecer@harry: cat atomicadd.cpp
void foo()
{
__atomic_add_fetch((int*)0, 1, 1);
}
i386
tsecer@harry: g++ -c atomicadd.cpp
tsecer@harry: objdump -d atomicadd.o
atomicadd.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <_Z3foov>:
0: f3 0f 1e fa endbr64
4: 55 push %rbp
5: 48 89 e5 mov %rsp,%rbp
8: f0 83 04 25 00 00 00 lock addl $0x1,0x0
f: 00 01
11: 90 nop
12: 5d pop %rbp
13: c3 ret
tsecer@harry:
arm
Although I don't have an arm compiler at hand, it can be verified in the compiler explorer for ARM GCC 16.1.0:
foo():
push {r7}
add r7, sp, #0
movs r3, #0
.L2:
ldrex r1, [r3]
add r1, r1, #1
strex r2, r1, [r3]
cmp r2, #0
bne .L2
dmb ish
nop
mov sp, r7
pop {r7}
bx lr
outro
This is a very essential analysis of a builtin function's processing in gcc, but it skims over the tree, gimple, rtx or even md(machine description). After all, all statements go through these basic procedures in the compiler, so it's pretty interesting to take a peek of what is going on under the hood, isn't it?
浙公网安备 33010602011771号