kv: introduce a stopgap for lack of ReplicaState synchronization#59194
Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom Jan 20, 2021
Merged
Conversation
Member
tbg
approved these changes
Jan 20, 2021
Member
tbg
left a comment
There was a problem hiding this comment.
Reviewed 5 of 5 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)
Contributor
Author
Blocked on CI like everyone else. TFTR! Will merge after things settle down. |
There's a scary lack of synchronization around how we set the ReplicaState for a given replica, and how we mark a replica as "initialized". What this means is that very temporarily, it's possible for the entry in Store.mu.replicas to be both "initialized" and have an empty ReplicaState. This was an existing problem, but is now more likely to bite us given the migrations infrastructure attempts to purge outdated replicas at start up time (when replicas are being initialized, and we're iterating through extan replicas in the Store.mu.replicas map). This issue has caused a bit of recent instability: cockroachdb#59180, cockroachdb#58489, \cockroachdb#58523, and cockroachdb#58378. While we work on a more considered fix to the problem (tracked in cockroachdb#58489), we can introduce stop the bleeding in the interim (and unskip some tests). Release note: None
fc314fb to
f47e1f3
Compare
Contributor
Author
|
bors r+ |
Contributor
|
Build succeeded: |
This was referenced Jan 21, 2021
irfansharif
added a commit
to irfansharif/cockroach
that referenced
this pull request
Feb 10, 2021
See cockroachdb#59194 and cockroachdb#58489 for more details. In cockroachdb#58489 we observed a scary lack of synchronization around how we set the ReplicaState for a given replica, and how we mark a replica as "initialized". What this meant is that it was possible for the entry in Store.mu.replicas to be both "initialized" and have an empty ReplicaState. This is now more likely to bite us given the migrations infrastructure attempts to purge outdated replicas at start up time (when replicas are being initialized, and we're iterating through extan replicas in the Store.mu.replicas map). We believed this was addressed as part of cockroachdb#58378, but that appears not to be the case. Lets re-introduce this stop-gap while we investigate. Release note: None
craig bot
pushed a commit
that referenced
this pull request
Feb 10, 2021
60429: kv: (re-)introduce a stopgap for lack of ReplicaState synchronization r=irfansharif a=irfansharif See #59194 and #58489 for more details. In #58489 we observed a scary lack of synchronization around how we set the ReplicaState for a given replica, and how we mark a replica as "initialized". What this meant is that it was possible for the entry in Store.mu.replicas to be both "initialized" and have an empty ReplicaState. This is now more likely to bite us given the migrations infrastructure attempts to purge outdated replicas at start up time (when replicas are being initialized, and we're iterating through extan replicas in the Store.mu.replicas map). We believed this was addressed as part of #58378, but that appears not to be the case. Lets re-introduce this stop-gap while we investigate. Release note: None Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
craig bot
pushed a commit
that referenced
this pull request
Feb 10, 2021
60281: sql/pgwire: send placeholder BackendKeyData r=asubiotto,spaskob a=rafiss fixes #13191 Some tools expect this message to be returned at connection time and will not connect without it. CockroachDB does not support pgwire cancellation, but we can still send a placeholder value here, and continue ignoring cancellation requests like we already do. Added a small test to make sure nothing broke. Release note (sql change): When a connection is established, CockroachDB will now return a placeholder BackendKeyData message in the response. This is for compatibility with some tools, but using the BackendKeyData to cancel a query will still have no effect, just the same as before. 60429: kv: (re-)introduce a stopgap for lack of ReplicaState synchronization r=irfansharif a=irfansharif See #59194 and #58489 for more details. In #58489 we observed a scary lack of synchronization around how we set the ReplicaState for a given replica, and how we mark a replica as "initialized". What this meant is that it was possible for the entry in Store.mu.replicas to be both "initialized" and have an empty ReplicaState. This is now more likely to bite us given the migrations infrastructure attempts to purge outdated replicas at start up time (when replicas are being initialized, and we're iterating through extan replicas in the Store.mu.replicas map). We believed this was addressed as part of #58378, but that appears not to be the case. Lets re-introduce this stop-gap while we investigate. Release note: None 60441: bazel: quash unnecessary dependency on `pkg/util/uuid` from protos r=rickystewart a=rickystewart This dependency can be replaced with a few `# keep` deps in a few choice proto targets, which is what we should have done the whole time anyway. This fixes build failures elsewhere in tree -- for example, `pkg/util/uuid:uuid_test`, which doesn't play nicely with `rules_go` in the presence of this dependency. Fixes #59778. Release note: None Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com> Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com> Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
There's a scary lack of synchronization around how we set the
ReplicaState for a given replica, and how we mark a replica as
"initialized". What this means is that very temporarily, it's possible
for the entry in Store.mu.replicas to be both "initialized" and have an
empty ReplicaState. This was an existing problem, but is now more likely
to bite us given the migrations infrastructure attempts to purge
outdated replicas at start up time (when replicas are being initialized,
and we're iterating through extan replicas in the Store.mu.replicas
map).
This issue has caused a bit of recent instability: #59180, #58489,
#58523, and #58378. While we work on a more considered fix to the
problem (tracked in #58489), we can introduce stop the bleeding in the
interim (and unskip some tests).
Release note: None