bootstrap: fix and clean up bootstrapping at older versions#112385
Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom Oct 18, 2023
Merged
bootstrap: fix and clean up bootstrapping at older versions#112385craig[bot] merged 1 commit intocockroachdb:masterfrom
craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
Member
c4acc70 to
5ff59e9
Compare
9651a3b to
f757c34
Compare
f757c34 to
9585a9a
Compare
9585a9a to
ed61b72
Compare
ed61b72 to
90704d2
Compare
stevendanna
approved these changes
Oct 18, 2023
Collaborator
stevendanna
left a comment
There was a problem hiding this comment.
Overall I really like this change, one less frustrating bit of confusion when trying to write a test that manipulates versions.
Member
Author
This commit fixes the logic around bootstrapping clusters at earlier versions. Currently there is some special casing around the minimum supported version which makes that work, but other backwards compatible versions don't work correctly. Specifically, after bumping the version to 24.1, a `local-mixed-23.2` config for logic tests does not work. The core problem is that we are initializing the stores with a version and then populating the initial system values at another version. I am reasonably confident this was the root cause for cockroachdb#104994, where we tried adding a 23.1 config while the min supported version was 22.2. This fix makes the logic clearer and simpler. We can only bootstrap the cluster at specific versions (the current version, and compatible released versions). When the bootstrap version is overridden, we find the latest version that we can initialize the cluster and use that for both the stores and the initial values. Then we can upgrade to the desired version from there. This simplifies the tests: we no longer need `BootstrapVersionKeyOverride` (which was confusing, given that we also have `BinaryVersionOverride`). Epic: none Release note: None
90704d2 to
f6d537e
Compare
Member
Author
|
bors r+ |
Contributor
|
Build failed (retrying...): |
Contributor
|
Build succeeded: |
stevendanna
added a commit
to stevendanna/cockroach
that referenced
this pull request
Oct 19, 2023
I don't yet have the complete story here, but in cockroachdb#112385 we changed this from looking just at initialized engines to all engines. In the case of multi-node, multi-engine tests, this occasionally results in the test timing out with liveness problems. What appears to happen is that in the cases where we fail, have their second and third stores at a lower cluster version resulting in the initial state of both the second and third nodes having a lower cluster version. Additional stores for non-bootstrap node are initialized asyncronously during start up. There may be a better fix here, but I need to become better acquainted with the code before making more serious changes here. Fixes cockroachdb#112676 Release note: None
craig bot
pushed a commit
that referenced
this pull request
Oct 19, 2023
112684: server: fix timeouts in multi-store tests r=RaduBerinde a=stevendanna I don't yet have the complete story here, but in #112385 we changed this from looking just at initialized engines to all engines. In the case of multi-node, multi-engine tests, this occasionally results in the test timing out with liveness problems. What appears to happen is that in the cases where we fail, nodes have their second and third stores at a lower cluster version resulting in the initial state of both the second and third nodes having a lower cluster version. Additional stores for non-bootstrap node are initialized asyncronously during start up. There may be a better fix here, but I need to become better acquainted with the code before making more serious changes here. Fixes #112658 Fixes #112676 Release note: None Co-authored-by: Steven Danna <danna@cockroachlabs.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note: this PR is only for the top commit, the rest is #112456.
This commit fixes the logic around bootstrapping clusters at earlier
versions. Currently there is some special casing around the minimum
supported version which makes that work, but other backwards
compatible versions don't work correctly. Specifically, after bumping
the version to 24.1, a
local-mixed-23.2config for logic tests doesnot work.
The core problem is that we are initializing the stores with a version
and then populating the initial system values at another version.
I am reasonably confident this was the root cause for #104994, where
we tried adding a 23.1 config while the min supported version was
22.2.
This fix makes the logic clearer and simpler. We can only bootstrap
the cluster at specific versions (the current version, and compatible
released versions). When the bootstrap version is overridden, we find
the latest version that we can initialize the cluster and use that for
both the stores and the initial values. Then we can upgrade to the
desired version from there.
This simplifies the tests: we no longer need
BootstrapVersionKeyOverride(which was confusing, given that we alsohave
BinaryVersionOverride).Epic: none
Release note: None