Skip to content

server: fix timeouts in multi-store tests#112684

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
stevendanna:deflake-multi-store-tests
Oct 19, 2023
Merged

server: fix timeouts in multi-store tests#112684
craig[bot] merged 1 commit intocockroachdb:masterfrom
stevendanna:deflake-multi-store-tests

Conversation

@stevendanna
Copy link
Copy Markdown
Collaborator

@stevendanna stevendanna commented Oct 19, 2023

I don't yet have the complete story here, but in #112385 we changed this from looking just at initialized engines to all engines.

In the case of multi-node, multi-engine tests, this occasionally results in the test timing out with liveness problems.

What appears to happen is that in the cases where we fail, nodes have their second and third stores at a lower cluster version resulting in the initial state of both the second and third nodes having a lower cluster version. Additional stores for non-bootstrap node are initialized asyncronously during start up.

There may be a better fix here, but I need to become better acquainted with the code before making more serious changes here.

Fixes #112658
Fixes #112676

Release note: None

I don't yet have the complete story here, but in cockroachdb#112385 we changed
this from looking just at initialized engines to all engines.

In the case of multi-node, multi-engine tests, this occasionally
results in the test timing out with liveness problems.

What appears to happen is that in the cases where we fail, have their
second and third stores at a lower cluster version resulting in the
initial state of both the second and third nodes having a lower
cluster version. Additional stores for non-bootstrap node are
initialized asyncronously during start up.

There may be a better fix here, but I need to become better acquainted
with the code before making more serious changes here.

Fixes cockroachdb#112676

Release note: None
@stevendanna stevendanna requested review from a team as code owners October 19, 2023 13:38
@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Oct 19, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@stevendanna stevendanna changed the title server: fix timeouts multi-store tests server: fix timeouts in multi-store tests Oct 19, 2023
Copy link
Copy Markdown
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find, thank you!!

I made this change because it seemed it can't hurt - we should have set the version on all the stores in the beginning of bootstrapCluster but evidently I missed something.

Copy link
Copy Markdown
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find, thank you!!

I made this change because it seemed it can't hurt - we should have set the version on all the stores in the beginning of bootstrapCluster but evidently I missed something.

Ah I think there are other uses of this function that I didn't consider.

@stevendanna
Copy link
Copy Markdown
Collaborator Author

bors r=RaduBerinde

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Oct 19, 2023

Build failed (retrying...):

@yuzefovich
Copy link
Copy Markdown
Member

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Oct 19, 2023

Already running a review

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Oct 19, 2023

Build succeeded:

@craig craig bot merged commit 15de755 into cockroachdb:master Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sql: TestShowRangesMultipleStores failed kv/kvserver: TestReplicateQueueRebalanceMultiStore failed

4 participants