mempool: rework lock discipline to mitigate callback deadlocks#9030
Merged
creachadair merged 2 commits intov0.35.xfrom Jul 19, 2022
Merged
mempool: rework lock discipline to mitigate callback deadlocks#9030creachadair merged 2 commits intov0.35.xfrom
creachadair merged 2 commits intov0.35.xfrom
Conversation
29b4a2f to
2213c80
Compare
creachadair
pushed a commit
that referenced
this pull request
Jul 18, 2022
(experimental) This is a preliminary manual backport of #9030. - Handle all locking internally (for v1 only). - Process initial ABCI checks inline (synchronously). Synchronous processing doesn't change the order of operations, since the local client processes async calls immediately anyway, and the socket client has to respond in strict order of request anyway.
creachadair
pushed a commit
that referenced
this pull request
Jul 18, 2022
(experimental) This is a preliminary manual backport of #9030. - Handle all locking internally (for v1 only). - Process initial ABCI checks inline (synchronously). Synchronous processing doesn't change the order of operations, since the local client processes async calls immediately anyway, and the socket client has to respond in strict order of request anyway.
5cb951a to
1b49aee
Compare
creachadair
pushed a commit
that referenced
this pull request
Jul 19, 2022
(manual backport of #9030) The priority mempool has a stricter synchronization requirement than the legacy mempool. Under sufficiently-heavy load, exclusive access can lead to deadlocks when processing a large batch of transaction rechecks through an out-of-process application using the socket client. By design, a socket client stalls when its send buffer fills, during which time it holds a lock shared with the receive thread. While blocked in this state, a response read by the receive thread waits for the shared lock so the callback can be invoked. If we're lucky, the server will then read the next request and make enough room in the buffer for the sender to proceed. If not however (e.g., if the next request is bigger than the one just consumed), the receive thread is blocked: It is waiting on the lock and cannot read a response. Once the server's output buffer fills, the system deadlocks. This can happen with any sufficiently-busy workload, but is more likely during a large recheck in the v1 mempool, where the callbacks need exclusive access to mempool state. As a workaround, process rechecks for the priority mempool in their own goroutines outside the mempool mutex. Responses still head-of-line block, but will no longer get pushback due to contention on the mempool itself.
creachadair
pushed a commit
that referenced
this pull request
Jul 19, 2022
(manual backport of #9030) The priority mempool has a stricter synchronization requirement than the legacy mempool. Under sufficiently-heavy load, exclusive access can lead to deadlocks when processing a large batch of transaction rechecks through an out-of-process application using the socket client. By design, a socket client stalls when its send buffer fills, during which time it holds a lock shared with the receive thread. While blocked in this state, a response read by the receive thread waits for the shared lock so the callback can be invoked. If we're lucky, the server will then read the next request and make enough room in the buffer for the sender to proceed. If not however (e.g., if the next request is bigger than the one just consumed), the receive thread is blocked: It is waiting on the lock and cannot read a response. Once the server's output buffer fills, the system deadlocks. This can happen with any sufficiently-busy workload, but is more likely during a large recheck in the v1 mempool, where the callbacks need exclusive access to mempool state. As a workaround, process rechecks for the priority mempool in their own goroutines outside the mempool mutex. Responses still head-of-line block, but will no longer get pushback due to contention on the mempool itself.
The priority mempool has a stricter synchronization requirement than the legacy mempool. Under sufficiently-heavy load, exclusive access can lead to deadlocks when processing a large batch of transaction rechecks through an out-of-process application using the socket client. By design, a socket client stalls when its send buffer fills, during which time it holds a lock shared with the receive thread. While blocked in this state, a response read by the receive thread waits for the shared lock so the callback can be invoked. If we're lucky, the server will then read the next request and make enough room in the buffer for the sender to proceed. If not however (e.g., if the next request is bigger than the one just consumed), the receive thread is blocked: It is waiting on the lock and cannot read a response. Once the server's output buffer fills, the system deadlocks. This can happen with any sufficiently-busy workload, but is more likely during a large recheck in the v1 mempool, where the callbacks need exclusive access to mempool state. As a workaround, process rechecks for the priority mempool in their own goroutines outside the mempool mutex. Responses still head-of-line block, but will no longer get pushback due to contention on the mempool itself.
1b49aee to
cac564f
Compare
williambanfield
approved these changes
Jul 19, 2022
Contributor
williambanfield
left a comment
There was a problem hiding this comment.
This looks like a great solution to the problem of two async requests trying to grab the same locks.
creachadair
pushed a commit
that referenced
this pull request
Jul 19, 2022
…ort #9030) (manual cherry-pick of commit 22ed610) The priority mempool has a stricter synchronization requirement than the legacy mempool. Under sufficiently-heavy load, exclusive access can lead to deadlocks when processing a large batch of transaction rechecks through an out-of-process application using the socket client. By design, a socket client stalls when its send buffer fills, during which time it holds a lock shared with the receive thread. While blocked in this state, a response read by the receive thread waits for the shared lock so the callback can be invoked. If we're lucky, the server will then read the next request and make enough room in the buffer for the sender to proceed. If not however (e.g., if the next request is bigger than the one just consumed), the receive thread is blocked: It is waiting on the lock and cannot read a response. Once the server's output buffer fills, the system deadlocks. This can happen with any sufficiently-busy workload, but is more likely during a large recheck in the v1 mempool, where the callbacks need exclusive access to mempool state. As a workaround, process rechecks for the priority mempool in their own goroutines outside the mempool mutex. Responses still head-of-line block, but will no longer get pushback due to contention on the mempool itself.
cmwaters
pushed a commit
that referenced
this pull request
Jul 20, 2022
The priority mempool has a stricter synchronization requirement than the legacy mempool. Under sufficiently-heavy load, exclusive access can lead to deadlocks when processing a large batch of transaction rechecks through an out-of-process application using the socket client. By design, a socket client stalls when its send buffer fills, during which time it holds a lock shared with the receive thread. While blocked in this state, a response read by the receive thread waits for the shared lock so the callback can be invoked. If we're lucky, the server will then read the next request and make enough room in the buffer for the sender to proceed. If not however (e.g., if the next request is bigger than the one just consumed), the receive thread is blocked: It is waiting on the lock and cannot read a response. Once the server's output buffer fills, the system deadlocks. This can happen with any sufficiently-busy workload, but is more likely during a large recheck in the v1 mempool, where the callbacks need exclusive access to mempool state. As a workaround, process rechecks for the priority mempool in their own goroutines outside the mempool mutex. Responses still head-of-line block, but will no longer get pushback due to contention on the mempool itself.
4 tasks
cmwaters
pushed a commit
to cmwaters/tendermint
that referenced
this pull request
Dec 12, 2022
…ort tendermint#9030) (tendermint#9033) (manual cherry-pick of commit 22ed610)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The priority mempool has a stricter synchronization requirement than the legacy
mempool. Under sufficiently-heavy load, exclusive access can lead to deadlocks
when processing a large batch of transaction rechecks through an out-of-process
application using the socket client.
By design, a socket client stalls when its send buffer fills, during which time
it holds a lock shared with the receive thread. While blocked in this state, a
response read by the receive thread waits for the shared lock so the callback
can be invoked.
If we're lucky, the server will then read the next request and make enough room
in the buffer for the sender to proceed. If not however (e.g., if the next
request is bigger than the one just consumed), the receive thread is blocked:
It is waiting on the lock and cannot read a response. Once the server's output
buffer fills, the system deadlocks.
This can happen with any sufficiently-busy workload, but is more likely during
a large recheck in the v1 mempool, where the callbacks need exclusive access to
mempool state. As a workaround, process rechecks for the priority mempool in
their own goroutines outside the mempool mutex. Responses still head-of-line
block, but will no longer get pushback due to contention on the mempool itself.
General notes