Skip to content

server: avoid deadlock when initing additional stores#107124

Merged
craig[bot] merged 2 commits intocockroachdb:masterfrom
tbg:TestAddNewStoresToExistingNodes-again
Jul 26, 2023
Merged

server: avoid deadlock when initing additional stores#107124
craig[bot] merged 2 commits intocockroachdb:masterfrom
tbg:TestAddNewStoresToExistingNodes-again

Conversation

@tbg
Copy link
Copy Markdown
Member

@tbg tbg commented Jul 18, 2023

We need to start node liveness before waiting for additional store init.

Otherwise, we can end up in a situation where each node is sitting on the
channel and nobody has started their liveness yet. The sender to the channel
will first have to get an Increment through KV, but if nobody acquires the
lease (since nobody's heartbeat loop is running), this will never happen.

In practice, most of the time, there is no deadlock because the lease
acquisition path performs a synchronous heartbeat to the own entry in most
cases (ignoring the fact that liveness hasn't been started yet). But there is
also another path where someone else's epoch needs to be incremented, and this
path also checks if the node itself is live - which it won't necessarily be
(liveness loop is not running yet).

Fixes #106706

Epic: None
Release note (bug fix): a rare (!) situation in which nodes would get stuck
during start-up was addressed. This is unlikely to have been encountered by
production users This is unlikely to have been encountered by users. If so, it
would manifest itself through a stack frame sitting on a select in
waitForAdditionalStoreInit for extended periods of time (i.e. minutes).

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

server: TestAddNewStoresToExistingNodes failed

3 participants