server: fix timeouts in multi-store tests by stevendanna · Pull Request #112684 · cockroachdb/cockroach

stevendanna · 2023-10-19T13:38:40Z

I don't yet have the complete story here, but in #112385 we changed this from looking just at initialized engines to all engines.

In the case of multi-node, multi-engine tests, this occasionally results in the test timing out with liveness problems.

What appears to happen is that in the cases where we fail, nodes have their second and third stores at a lower cluster version resulting in the initial state of both the second and third nodes having a lower cluster version. Additional stores for non-bootstrap node are initialized asyncronously during start up.

There may be a better fix here, but I need to become better acquainted with the code before making more serious changes here.

Fixes #112658
Fixes #112676

Release note: None

I don't yet have the complete story here, but in cockroachdb#112385 we changed this from looking just at initialized engines to all engines. In the case of multi-node, multi-engine tests, this occasionally results in the test timing out with liveness problems. What appears to happen is that in the cases where we fail, have their second and third stores at a lower cluster version resulting in the initial state of both the second and third nodes having a lower cluster version. Additional stores for non-bootstrap node are initialized asyncronously during start up. There may be a better fix here, but I need to become better acquainted with the code before making more serious changes here. Fixes cockroachdb#112676 Release note: None

blathers-crl · 2023-10-19T13:38:44Z

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

cockroach-teamcity · 2023-10-19T13:38:50Z

This change is

RaduBerinde

Nice find, thank you!!

I made this change because it seemed it can't hurt - we should have set the version on all the stores in the beginning of bootstrapCluster but evidently I missed something.

RaduBerinde

Nice find, thank you!!

I made this change because it seemed it can't hurt - we should have set the version on all the stores in the beginning of bootstrapCluster but evidently I missed something.

Ah I think there are other uses of this function that I didn't consider.

stevendanna · 2023-10-19T14:34:53Z

bors r=RaduBerinde

craig · 2023-10-19T15:42:56Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

yuzefovich · 2023-10-19T17:02:10Z

bors r+

craig · 2023-10-19T17:02:12Z

Already running a review

craig · 2023-10-19T18:16:42Z

Build succeeded:

Bazel Essential CI (Cockroach)

stevendanna requested review from a team as code owners October 19, 2023 13:38

stevendanna changed the title ~~server: fix timeouts multi-store tests~~ server: fix timeouts in multi-store tests Oct 19, 2023

stevendanna requested a review from RaduBerinde October 19, 2023 13:39

RaduBerinde approved these changes Oct 19, 2023

View reviewed changes

yuzefovich mentioned this pull request Oct 19, 2023

kvserver: skip flaky TestReplicateQueueRebalanceMultiStore #112699

Closed

craig bot merged commit 15de755 into cockroachdb:master Oct 19, 2023

abarganier mentioned this pull request Oct 20, 2023

server: TestNodesV2 failed #112659

Closed

itsbilal mentioned this pull request Oct 26, 2023

roachtest: kv95/enc=false/nodes=4/ssds=8 failed #112730

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: fix timeouts in multi-store tests#112684

server: fix timeouts in multi-store tests#112684
craig[bot] merged 1 commit intocockroachdb:masterfrom
stevendanna:deflake-multi-store-tests

stevendanna commented Oct 19, 2023 •

edited by yuzefovich

Loading

Uh oh!

blathers-crl bot commented Oct 19, 2023

Uh oh!

cockroach-teamcity commented Oct 19, 2023

Uh oh!

RaduBerinde left a comment

Uh oh!

RaduBerinde left a comment

Uh oh!

stevendanna commented Oct 19, 2023

Uh oh!

craig bot commented Oct 19, 2023

Uh oh!

yuzefovich commented Oct 19, 2023

Uh oh!

craig bot commented Oct 19, 2023

Uh oh!

craig bot commented Oct 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

stevendanna commented Oct 19, 2023 • edited by yuzefovich Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

blathers-crl bot commented Oct 19, 2023

Uh oh!

cockroach-teamcity commented Oct 19, 2023

Uh oh!

RaduBerinde left a comment

Choose a reason for hiding this comment

Uh oh!

RaduBerinde left a comment

Choose a reason for hiding this comment

Uh oh!

stevendanna commented Oct 19, 2023

Uh oh!

craig bot commented Oct 19, 2023

Uh oh!

yuzefovich commented Oct 19, 2023

Uh oh!

craig bot commented Oct 19, 2023

Uh oh!

craig bot commented Oct 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

stevendanna commented Oct 19, 2023 •

edited by yuzefovich

Loading