server: fix a race in tenant creation#107666
Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom Jul 27, 2023
Merged
Conversation
Previously, scanTenantsForRunnableServices() was not holding the mutex when SELECTing for the existing tenant names, which means that the following may happen: - scanTenantsForRunnableServices() sees that only the system tenant exists - createServerEntryLocked() then adds another tenant while holding the mutex - scanTenantsForRunnableServices() takes the lock and stops the tenant that was just created because only the system tenant should be alive (which is wrong) This patch changes scanTenantsForRunnableServices() to take the mutex before SELECTing for the existing tenants in order to avoid the race. Epic: none Fixes: cockroachdb#107434 Release note: None
|
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Member
Collaborator
Contributor
|
let's merge this - i need it in a different PR too! bors r+ |
Contributor
|
This PR was included in a batch that timed out, it will be automatically retried |
Contributor
This was referenced Jul 27, 2023
Contributor
|
Build failed (retrying...): |
Contributor
|
Build succeeded: |
craig bot
pushed a commit
that referenced
this pull request
Jul 28, 2023
107820: db-console: delete unused vars and enforce eslint rule r=maryliag a=xinhaoz This commit turns the eslint rule no-unused-vars to errors. It removes all unused vars in the db-console application. Epic: none Release note: None 107824: server: prevent deadlocks in server orchestration r=lidorcarmel,andrewbaptist a=knz Fixes #107564. Fixes #107791. Supersedes #107666. The previous fix in this area (5ca5703) correctly identified the case where `createServerEntryLocked()` was called concurrently with `scanTenantsForRunnableServices()`, in which case we ran the risk of immediately tearing down the new server because it hadn't be picked up by `getExpectedRunningTenants()`. However, the fix was incorrect: it was causing the controller mutex to be held through `getExpectedRunningTenants()`, which itself can hang. In that case, a cascading failure could result. This patch changes the fix (and thus continues to solve the original problem) by ensuring we only look at entries to remove that existed prior to the call to `getExpectedRunningTenants()`. No mutex needs to be held here. Release note: None Epic: CRDB-28893 Co-authored-by: Xin Hao Zhang <xzhang@cockroachlabs.com> Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Previously, scanTenantsForRunnableServices() was not holding the mutex when SELECTing for the existing tenant names, which means that the following may happen:
This patch changes scanTenantsForRunnableServices() to take the mutex before SELECTing for the existing tenants in order to avoid the race.
Epic: none
Fixes: #107434
Fixes: #107343
Fixes: #107154
Release note: None