3

What is the difference in logic and performance between x86-instructions LOCK XCHG and MOV+MFENCE for doing a sequential-consistency store.

(We ignore the load result of the XCHG; compilers other than gcc use it for the store + memory barrier effect.)

Is it true, that for sequential consistency, during the execution of an atomic operation: LOCK XCHG locks only a single cache-line, and vice versa MOV+MFENCE locks whole cache-L3(LLC)?

10
  • 2
    Apples and oranges, MFENCE doesn't provide atomicity. Commented Sep 30, 2013 at 14:10
  • 2
    @Hans Passant I didn't say that MFENCE provide atomicity, because MOV already atomic - this we can see in C11(atomic)/C++11(std::atomic) for all ordering in x86 except SC(sequential consistency): en.cppreference.com/w/cpp/atomic/memory_order But i said that MFENCE provide sequential consistency for atomic variables as we can see in C11(atomic)/C++11(std::atomic) in GCC4.8.2: stackoverflow.com/questions/19047327/… Commented Sep 30, 2013 at 14:37
  • 1
    (I'm not even sure if mov is atomic for unaligned access, by the way.) Commented Sep 30, 2013 at 14:57
  • 2
    @Kerrek SB MOV+MFENCE(SC in GCC4.8.2) we can replace on LOCK XCHG for SC as we can see in video where on 0:28:20 said that MFENCE more expensive that XCHG: channel9.msdn.com/Shows/Going+Deep/… Commented Sep 30, 2013 at 15:18
  • 1
    @Alex, see also here - stackoverflow.com/questions/19059542/… Commented Sep 30, 2013 at 17:46

1 Answer 1

-1

The difference is in purpose of usage.

MFENCE (or SFENCE or LFENCE) is useful when we are locking a part of memory region accessible from two or more threads. When we atomically set the lock for this memory region we can after that use all non-atomic instruction, because there are faster. But we must call SFANCE (or MFENCE) one instruction before unlocking the memory region to ensure that locked memory is visible correctly to all other threads.

If we are changing only a single memory aligned variable, then we are using atomic instructions like LOCK XCHG so no lock of memory region is needed.

Sign up to request clarification or add additional context in comments.

No, an x86 lock is in itself an mfence (it's even said in the video here), so you don't need another one (let alone any one directional fence at entry/exit of critical sections). Also, there's no such thing as locking the L3, mfence does not lock anything (so it does not ensure any atomicity), it just ensures serialization of all memory operations in the thread that used it
@Alex, I think you got it mixed up a little - fences are creatures of the ISA, x86 in this case. Caches are implementation detailes, and are "under the hood" mostly. Any x86 load/store operation will collect coherent data from other cores/sockets thanks to a MESI/snoops protocol. Modified lines in your own core are also maintained by that protocol (although there is an ISA hook to flush them out - but that's with wbinvd/clflush, not sfence). Either way, the exact behavior of the HW may differ between products (but most modern CPUs don't have to go with expensive bus locks for these ops)
MESIF/MOESI allow some optimization in HW, but are not relevant here - a lock will hold any line in place regardless of state. However, I don't agree with your 2nd part - MFENCE applies only for the program order in a given thread, not others. It may help in some consistency cases (as I wrote here - stackoverflow.com/questions/19059542/…), but only because it serializes each thread internally, not through any atomicity, or "cache locking" as you insinuate. If you think otherwise, please open a question with an example.
@Alex SFENCE is an ordered flush of local outstanding writes to shared memory. Two cores can do simultaneous SFENCE and a third core will see the writes interleaved. Intel says: "Writes from an individual processor are NOT ordered with respect to the writes from other processors."
In addition to being wrong, this doesn't answer the question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.