Skip to content

bootstrap: fix and clean up bootstrapping at older versions#112385

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
RaduBerinde:bootstrap-cleanup
Oct 18, 2023
Merged

bootstrap: fix and clean up bootstrapping at older versions#112385
craig[bot] merged 1 commit intocockroachdb:masterfrom
RaduBerinde:bootstrap-cleanup

Conversation

@RaduBerinde
Copy link
Copy Markdown
Member

@RaduBerinde RaduBerinde commented Oct 15, 2023

Note: this PR is only for the top commit, the rest is #112456.


This commit fixes the logic around bootstrapping clusters at earlier
versions. Currently there is some special casing around the minimum
supported version which makes that work, but other backwards
compatible versions don't work correctly. Specifically, after bumping
the version to 24.1, a local-mixed-23.2 config for logic tests does
not work.

The core problem is that we are initializing the stores with a version
and then populating the initial system values at another version.

I am reasonably confident this was the root cause for #104994, where
we tried adding a 23.1 config while the min supported version was
22.2.

This fix makes the logic clearer and simpler. We can only bootstrap
the cluster at specific versions (the current version, and compatible
released versions). When the bootstrap version is overridden, we find
the latest version that we can initialize the cluster and use that for
both the stores and the initial values. Then we can upgrade to the
desired version from there.

This simplifies the tests: we no longer need
BootstrapVersionKeyOverride (which was confusing, given that we also
have BinaryVersionOverride).

Epic: none
Release note: None

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@RaduBerinde RaduBerinde force-pushed the bootstrap-cleanup branch 3 times, most recently from c4acc70 to 5ff59e9 Compare October 15, 2023 21:21
@RaduBerinde RaduBerinde changed the title WIP bootstrap: fix and clean up bootstrapping at older versions Oct 17, 2023
@RaduBerinde RaduBerinde requested a review from rafiss October 17, 2023 18:48
@RaduBerinde RaduBerinde marked this pull request as ready for review October 17, 2023 18:49
@RaduBerinde RaduBerinde requested review from a team as code owners October 17, 2023 18:49
@RaduBerinde RaduBerinde requested review from msbutler and removed request for a team October 17, 2023 18:49
Copy link
Copy Markdown
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I really like this change, one less frustrating bit of confusion when trying to write a test that manipulates versions.

@RaduBerinde
Copy link
Copy Markdown
Member Author

TFTR! Will merge after I merge #112456 and rebase.

As a confirmation of the fix, the new local-mixed-23 logictest config in #112271 now works.

This commit fixes the logic around bootstrapping clusters at earlier
versions. Currently there is some special casing around the minimum
supported version which makes that work, but other backwards
compatible versions don't work correctly.  Specifically, after bumping
the version to 24.1, a `local-mixed-23.2` config for logic tests does
not work.

The core problem is that we are initializing the stores with a version
and then populating the initial system values at another version.

I am reasonably confident this was the root cause for cockroachdb#104994, where
we tried adding a 23.1 config while the min supported version was
22.2.

This fix makes the logic clearer and simpler. We can only bootstrap
the cluster at specific versions (the current version, and compatible
released versions). When the bootstrap version is overridden, we find
the latest version that we can initialize the cluster and use that for
both the stores and the initial values. Then we can upgrade to the
desired version from there.

This simplifies the tests: we no longer need
`BootstrapVersionKeyOverride` (which was confusing, given that we also
have `BinaryVersionOverride`).

Epic: none
Release note: None
@RaduBerinde
Copy link
Copy Markdown
Member Author

TestReplicateQueueRebalanceMultiStore flake

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Oct 18, 2023

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Oct 18, 2023

Build succeeded:

@craig craig bot merged commit d2c5bbe into cockroachdb:master Oct 18, 2023
stevendanna added a commit to stevendanna/cockroach that referenced this pull request Oct 19, 2023
I don't yet have the complete story here, but in cockroachdb#112385 we changed
this from looking just at initialized engines to all engines.

In the case of multi-node, multi-engine tests, this occasionally
results in the test timing out with liveness problems.

What appears to happen is that in the cases where we fail, have their
second and third stores at a lower cluster version resulting in the
initial state of both the second and third nodes having a lower
cluster version. Additional stores for non-bootstrap node are
initialized asyncronously during start up.

There may be a better fix here, but I need to become better acquainted
with the code before making more serious changes here.

Fixes cockroachdb#112676

Release note: None
craig bot pushed a commit that referenced this pull request Oct 19, 2023
112684: server: fix timeouts in multi-store tests r=RaduBerinde a=stevendanna

I don't yet have the complete story here, but in #112385 we changed this from looking just at initialized engines to all engines.

In the case of multi-node, multi-engine tests, this occasionally results in the test timing out with liveness problems.

What appears to happen is that in the cases where we fail, nodes have their second and third stores at a lower cluster version resulting in the initial state of both the second and third nodes having a lower cluster version. Additional stores for non-bootstrap node are initialized asyncronously during start up.

There may be a better fix here, but I need to become better acquainted with the code before making more serious changes here.

Fixes #112658
Fixes #112676

Release note: None

Co-authored-by: Steven Danna <danna@cockroachlabs.com>
@RaduBerinde RaduBerinde deleted the bootstrap-cleanup branch November 7, 2025 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants