kvserver: reject requests when quiescing #52843
kvserver: reject requests when quiescing #52843andreimatei wants to merge 2 commits intocockroachdb:masterfrom
Conversation
This patch makes the Store reject requests once its stopper is quiescing. Before this patch, we didn't seem to have good protection against requests not running after the stopper has been stopped. We've seen this in some tests, where requests were racing with the engine closing. Running after the stopper has stopped is generally pretty undefined behavior, so let's avoid it. I think the reason why we didn't see a lot of errors from such races is that we're stopping the gRPC server even before we start quiescing, so at least for requests coming from remote nodes we had some draining window. This is a lighter version of cockroachdb#51566. That patch was trying to run requests as tasks, so that they properly synchronize with server shutdown. This patch still allows races between requests that started evaluating the server started quiescing and server shutdown. Touches cockroachdb#51544 Release note: None
This patch runs some infrequent operations that might use the storage engine as tasks, and thus synchronizes them with server shutdown. In cockroachdb#51544 we've seen one of these cause a crash when executing after Pebble was shut down. Release note: None
nvb
left a comment
There was a problem hiding this comment.
Thanks for simplifying this. This is better, but unfortunately even this seems too heavyweight to me. Forcing every request in the system to synchronize on a single mutex just so we can have a best-effort check for quiescence does not seem like a particularly good trade-off. Even these short-lived mutex acquisitions impose a non-negligible performance cost, especially if they're globally scoped. We see this today in profiles, and we're nowhere near as efficient as we eventually will be.
As a general design principle when approaching process draining, I think we need to heavily optimize for the request serving path, potentially even at the expense of the draining path. A process will serve millions/billions of requests over its lifetime and will only drain once. So we should make things as cheap as possible for requests. If we want a best-effort check, we should use an atomic read, ideally on a variable isolated in its own cache line. If we want to wait for requests to drain, we should use an atomic counter, which is already what we're doing with the sync.WaitGroup in Stopper. If we need to do something fancier, let's at least try to piggyback on top of existing draining mechanisms in gRPC.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @bdarnell, @nvanbenschoten, and @tbg)
…eation This commit optimizes the Stopper for task creation by ripping out the existing heavyweight task tracking in production builds. I realized that my biggest concern with most of the proposals (cockroachdb#52843 and cockroachdb#51566) being floated to address cockroachdb#51544 was that they bought more into the inefficient tracking in the Stopper, not that they were doing anything inherently wrong themselves. Before this change, creating a task acquired an exclusive mutex and then wrote to a hashmap. At high levels of concurrency, this would have become a performance chokepoint. After this change, the cost of launching a Task is three atomic increments – one to acquire a read lock, one to register with a WaitGroup, and one to release the read lock. When no one is draining the Stopper, these are all wait-free operations, which means that task creation becomes wait-free. With a change like this, I would feel much more comfortable pushing on Stopper tasks to solve cockroachdb#51544.
This patch makes the Store reject requests once its stopper is
quiescing.
Before this patch, we didn't seem to have good protection against
requests not running after the stopper has been stopped. We've seen this
in some tests, where requests were racing with the engine closing.
Running after the stopper has stopped is generally pretty undefined
behavior, so let's avoid it.
I think the reason why we didn't see a lot of errors from such races is
that we're stopping the gRPC server even before we start quiescing, so
at least for requests coming from remote nodes we had some draining
window.
This is a lighter version of #51566. That patch was trying to run
requests as tasks, so that they properly synchronize with server
shutdown. This patch still allows races between requests that started
evaluating the server started quiescing and server shutdown.
Touches #51544
Release note: None