During trampoline generation, only the data mapping is flushed:
#define FFI_INIT_TRAMPOLINE(TRAMP,FUN,CTX,FLAGS) \
({unsigned char *__tramp = (unsigned char*)(TRAMP); \
UINT64 __fun = (UINT64)(FUN); \
UINT64 __ctx = (UINT64)(CTX); \
UINT64 __flags = (UINT64)(FLAGS); \
memcpy (__tramp, trampoline, sizeof (trampoline)); \
memcpy (__tramp + 12, &__fun, sizeof (__fun)); \
memcpy (__tramp + 20, &__ctx, sizeof (__ctx)); \
memcpy (__tramp + 28, &__flags, sizeof (__flags)); \
ffi_clear_cache(__tramp, __tramp + FFI_TRAMPOLINE_SIZE); \
})
At least one aarch64 needs a cache flush on the code mapping as well. I'm trying to get confirmation whether this is a CPU erratum or not. After all, no other architecture behaves in this way.