Fix race conditions on disconnects#250
Conversation
…34756) Fix from: bitcoin/bitcoin#34711 (comment) Also reported in: bitcoin/bitcoin#34756
Fix from: bitcoin/bitcoin#34782 (comment) Failure is also triggered by antithesis bitcoin/bitcoin#34782 (comment)
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process. |
|
I made some adjustments to the tests in #249:
Opened a fresh PR for a full CI run. @ryanofsky feel free to take the next turn and push to your PR, or I can address remaining issues here. |
The LLM came up with a third test, but beyond reproducing the original issue, it doesn't seem very useful. So I omitted it in this PR. See Sjors@7c59dac |
|
Thanks for updating this! I opened bitcoin/bitcoin#34804 to see how this interacts with bitcoin core CI. (I wonder if there is a way we automatically run bitcoin core CI on every libmultiprocess PR, or run it more easily. Might be something to look into).
Didn't look very closely but the test seems clean and I'd be inclined to add it since I already know I want to follow up this PR by duplicating the disconnect tests and maybe reviving my earlier attempt. If the test turns out not to be meaningful or is a maintenance burden we can remove it. Anyway I plan to review this soon |
I think you can just copy-paste this yaml: https://github.com/bitcoin-core/qa-assets/blob/main/.github/workflows/ci.yml#L1 (The qa-assets checkout can be removed) |
|
I pushed the third test. (oh oops, let me check the CI error...) |
Add testing_hook_makethread that fires on the worker thread between the Lock and set_value operations in makeThread. A checker thread uses try_lock to verify the waiter mutex is held at that point. With the fix (Lock before set_value) the mutex is held, so try_lock fails. Without the fix (set_value before Lock) the hook fires before Lock, so try_lock succeeds -- indicating the race window exists.
|
Fixed missing include. |
|
ACK 3f28bca for the fix commits by @ryanofsky. |
|
Looks like the subtree update PR that includes this, in our CI, still has heap-use-after-free / data race issues: bitcoin/bitcoin#34804 (comment). |
Commented in the other PR, but those are both caused by this line in newly added test code. I'm currently going through the PR and improving / rewriting the tests now. I will probably reopen #249 with the updates then refresh bitcoin/bitcoin#34804 |
|
Ok, I'll close this one.
The first two commits also had issues in their test in earlier versions. I didn't thoroughly check the third commit because I initially didn't use it. |
|
Thanks! I wound up changing a bunch of things in the tests too, but they were extremely helpful to have as a starting point |
The PR fixes 3 race conditions on disconnects that were detected in Bitcoin core CI runs and by antithesis:
capnp::CallContext<ipc::capnp::messages::BlockTemplate::GetBlockParams, ipc::capnp::messages::BlockTemplate::GetBlockResults>::getParams()bitcoin/bitcoin#34777 (comment)