Skip to content

roachprod: run scheduled backup init without timeout#97495

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
msbutler:butler-deflake-roachtest
Feb 22, 2023
Merged

roachprod: run scheduled backup init without timeout#97495
craig[bot] merged 1 commit intocockroachdb:masterfrom
msbutler:butler-deflake-roachtest

Conversation

@msbutler
Copy link
Copy Markdown
Collaborator

Previously, several roachtests failed during a cluster restart because a node serving the default scheduled backup command was not ready to serve requests. At this time, when roachprod start returns, not every node may be ready to serve requests.

To prevent this failure mode, this patch changes the scheduled backup cmd during roachprod.Start() to run with infinite timeout and only on the the first node in the cluster.

Fixes #97010, #97232

Release note: None

Epic: none

@msbutler msbutler added the T-testeng TestEng Team label Feb 22, 2023
@msbutler msbutler requested a review from renatolabs February 22, 2023 18:14
@msbutler msbutler requested a review from a team as a code owner February 22, 2023 18:14
@msbutler msbutler removed the request for review from a team February 22, 2023 18:14
@msbutler msbutler self-assigned this Feb 22, 2023
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@msbutler
Copy link
Copy Markdown
Collaborator Author

@renatolabs this patch seems to work on a few roachtests which run restarts. Happy run it on the whole nightly suite if you'd like.

Copy link
Copy Markdown

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Previously, several roachtests failed during a cluster restart because a node
serving the default scheduled backup command was not ready to serve requests.
At this time, when roachprod start returns, not every node may be ready to
serve requests.

To prevent this failure mode, this patch changes the scheduled backup cmd
during roachprod.Start() to run with infinite timeout and only on the the first
node in the cluster.

Fixes cockroachdb#97010, cockroachdb#97232

Release note: None

Epic: none
@msbutler msbutler force-pushed the butler-deflake-roachtest branch from abbeda4 to c1a3eed Compare February 22, 2023 19:00
@msbutler
Copy link
Copy Markdown
Collaborator Author

TFTR!

bors r=renatolabs

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Feb 22, 2023

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

T-testeng TestEng Team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

roachtest: tpccbench/nodes=12/cpu=16 failed

3 participants