SelfContainedMutex: Strengthen compiler fence by sporksmith · Pull Request #2552 · shadow/shadow

sporksmith · 2022-11-18T01:53:57Z

Using just Acquire here still lets the compiler move writes before the fence to after the fence. We need both Acquire and Release semantics.

Pointed out here:
https://masto.ai/@rfc1149/109359308547583544

And handy visual aid here:
https://masto.ai/@rfc1149/109359347916681723

Using just `Acquire` here still lets the compiler move writes before the fence to after the fence. We need both `Acquire` and `Release` semantics. Pointed out here: https://masto.ai/@rfc1149/109359308547583544 And handy visual aid here: https://masto.ai/@rfc1149/109359347916681723

stevenengler · 2022-11-18T02:08:40Z

src/lib/shadow-shim-helper-rs/src/scmutex.rs

-        // Ensure that the `futex.store` above can't be moved after the
-        // `sleepers.load` below.
-        std::sync::atomic::compiler_fence(Ordering::Acquire);
+        // Acquire: Ensure that the `sleepers.load` below can't be moved to before
+        // this fence (and therefore before the `futex.store` above)
+        // Release: Ensure that the `futex.store` above can't be moved to after
+        // this fence (and therefore not after the `sleepers.load`)
+        std::sync::atomic::compiler_fence(Ordering::AcqRel);


I'm not clear on why this is a compiler fence and not a memory fence? We can tell the compiler not to reorder the memory accesses, but that doesn't mean the hardware won't reorder them.

Yeah, I think you're right.

I spent some time last night seeing if I could get a test up and running with loom. Unfortunately I ran into a crash in loom. After creating a smaller repro of that crash, I think it only happens when the code under test has a deadlock bug. So this gives some information, but doesn't give me a lot of diagnostics to find the details of the deadlock until the loom bug is fixed.

If I change this part of the code to unconditionally wake, the tests pass. I also tried changing this to be an atomic::fence, and changing all the orderings to use AcqRel, but I still get the crash (and hence presumably a possible deadlock) in those cases.

Digging in some more. I had an alternate design idea where I keep the sleeper count inside the same atomic as the futex word. It'd add some fiddly shifting and masking, and I think perform worse under contention (since multiple threads might race each-other trying to update the sleeper count and then sleep on the futex), but is easier to reason about concurrency-wise. If I can't find the problem in this version of the design maybe I'll take another look at doing it that way...

codecov · 2022-11-18T02:15:02Z

Codecov Report

Base: 67.28% // Head: 67.25% // Decreases project coverage by -0.02% ⚠️

Coverage data is based on head (43c7e3f) compared to base (e10e2d4).
Patch coverage: 97.56% of modified lines in pull request are covered.

❗ Current head 43c7e3f differs from pull request most recent head 41030ed. Consider uploading reports for the commit 41030ed to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2552      +/-   ##
==========================================
- Coverage   67.28%   67.25%   -0.03%     
==========================================
  Files         193      194       +1     
  Lines       28608    28621      +13     
  Branches     5615     5618       +3     
==========================================
+ Hits        19248    19250       +2     
- Misses       4923     4936      +13     
+ Partials     4437     4435       -2

Flag	Coverage Δ
tests	`67.25% <97.56%> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/lib/shadow-shim-helper-rs/src/shim_shmem.rs	`78.24% <ø> (ø)`
src/lib/shadow-shim-helper-rs/src/scmutex.rs	`87.64% <97.50%> (+0.89%)`	⬆️
src/lib/shadow-shim-helper-rs/tests/loom_tests.rs	`100.00% <100.00%> (ø)`
src/main/core/scheduler/pools/unbounded.rs	`77.90% <0.00%> (-4.07%)`	⬇️
src/main/host/syscall/handler/unistd.rs	`52.78% <0.00%> (-3.87%)`	⬇️
src/main/utility/shm_cleanup.rs	`59.55% <0.00%> (-2.25%)`	⬇️
src/main/host/descriptor/pipe.rs	`84.17% <0.00%> (-1.90%)`	⬇️
src/main/host/syscall/handler/mod.rs	`76.00% <0.00%> (-1.34%)`	⬇️
src/main/host/descriptor/mod.rs	`65.35% <0.00%> (-0.66%)`	⬇️
src/main/host/syscall/handler/socket.rs	`67.97% <0.00%> (-0.19%)`	⬇️
... and 7 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

sporksmith · 2022-11-19T16:27:19Z

Closing in favor of a pending more-complete fix. See #2557

sporksmith added this to the Code health and maintenance milestone Nov 18, 2022

sporksmith self-assigned this Nov 18, 2022

sporksmith requested a review from stevenengler November 18, 2022 01:54

sporksmith enabled auto-merge (squash) November 18, 2022 01:54

github-actions bot added the Component: Libraries Support functions like LD_PRELOAD and logging label Nov 18, 2022

stevenengler reviewed Nov 18, 2022

View reviewed changes

github-actions bot added the Component: Build Build/install tools and dependencies label Nov 18, 2022

sporksmith force-pushed the mutex-order branch from 43c7e3f to 41030ed Compare November 18, 2022 04:17

github-actions bot removed the Component: Build Build/install tools and dependencies label Nov 18, 2022

sporksmith disabled auto-merge November 18, 2022 16:46

sporksmith mentioned this pull request Nov 19, 2022

Deadlock in SelfContainedMutex #2557

Closed

sporksmith closed this Nov 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SelfContainedMutex: Strengthen compiler fence#2552

SelfContainedMutex: Strengthen compiler fence#2552
sporksmith wants to merge 1 commit intoshadow:mainfrom
sporksmith:mutex-order

sporksmith commented Nov 18, 2022

Uh oh!

stevenengler Nov 18, 2022

Uh oh!

sporksmith Nov 18, 2022

Uh oh!

codecov bot commented Nov 18, 2022 •

edited

Loading

Uh oh!

sporksmith commented Nov 19, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sporksmith commented Nov 18, 2022

Uh oh!

stevenengler Nov 18, 2022

Choose a reason for hiding this comment

Uh oh!

sporksmith Nov 18, 2022

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Nov 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sporksmith commented Nov 19, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Nov 18, 2022 •

edited

Loading