Skip to content

roachtest: allocate new cluster when c.Extend fails#115539

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
DarrylWong:cluster-extend-fail
Dec 6, 2023
Merged

roachtest: allocate new cluster when c.Extend fails#115539
craig[bot] merged 1 commit intocockroachdb:masterfrom
DarrylWong:cluster-extend-fail

Conversation

@DarrylWong
Copy link
Copy Markdown
Contributor

Previously, failure to extend the lifetime of a cluster would propagate to the worker and end up aborting the entire run. Now, we destroy the cluster and let it allocate a new one.

Fixes: #112509
Release note: none
Epic: none

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@DarrylWong
Copy link
Copy Markdown
Contributor Author

DarrylWong commented Dec 4, 2023

Note that this change moves the cluster extend check from right before the start of the test (in runTest) to where we handle cluster reuse (runWorker). Having to destroy/create/setup a cluster in runTest got messy real fast as most of the runWorker loop is to destroy/create/setup a cluster. I think this should be fine though, since we give a very generous 1 hour for the test runner to find/setup for tests and it looks like minimal work is done between the two if the extend case is hit.

https://github.com/cockroachdb/cockroach/blob/0da761b40e6f7ca1d6be53ad1492070ffc49fd61/pkg/cmd/roachtest/test_runner.go#L597

Tested on local gceworker with error injected in c.Extend() and confirmed it works.

Also running a 0.1 select probability test run in case the above note does break something.

@DarrylWong DarrylWong marked this pull request as ready for review December 4, 2023 16:23
@DarrylWong DarrylWong requested a review from a team as a code owner December 4, 2023 16:23
@DarrylWong DarrylWong requested review from herkolategan and srosenberg and removed request for a team December 4, 2023 16:23
Copy link
Copy Markdown
Member

@srosenberg srosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DarrylWong and @herkolategan)


pkg/cmd/roachtest/test_runner.go line 595 at r1 (raw file):

					// after the test finishes so that the next test can be selected. If it
					// doesn't, extend it.
					timeout := testTimeout(&testToRun.spec)

It might improve readability slightly to factor this out into a separate helper.


pkg/cmd/roachtest/test_runner.go line 609 at r1 (raw file):

							// We don't release the quota allocation - the new cluster will be
							// identical.
							testToRun.canReuseCluster = false

This block of code is duplicated with the above; it could be consolidated below in case either reuse prep. attempt (i.e.,Wipe or Extend) fails.

@DarrylWong DarrylWong force-pushed the cluster-extend-fail branch 4 times, most recently from d33bc83 to be018f4 Compare December 5, 2023 20:31
Copy link
Copy Markdown
Contributor Author

@DarrylWong DarrylWong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @herkolategan and @srosenberg)


pkg/cmd/roachtest/test_runner.go line 595 at r1 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

It might improve readability slightly to factor this out into a separate helper.

Good idea since WipeForReuse is already in a helper.

Done


pkg/cmd/roachtest/test_runner.go line 609 at r1 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

This block of code is duplicated with the above; it could be consolidated below in case either reuse prep. attempt (i.e.,Wipe or Extend) fails.

Done

Previously, failure to extend the lifetime of a cluster would propagate to
the worker and end up aborting the entire run. Now, we destroy the cluster
and let it allocate a new one.

Fixes: cockroachdb#112509
Release note: none
Epic: none
@DarrylWong
Copy link
Copy Markdown
Contributor Author

Reconfirmed it works with the refactors. TFTR!

bors r=srosenberg

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Dec 6, 2023

Build succeeded:

@craig craig bot merged commit 9c5933a into cockroachdb:master Dec 6, 2023
msbutler pushed a commit to msbutler/cockroach that referenced this pull request Dec 6, 2023
115539: roachtest: allocate new cluster when c.Extend fails r=srosenberg a=DarrylWong

Previously, failure to extend the lifetime of a cluster would propagate to the worker and end up aborting the entire run. Now, we destroy the cluster and let it allocate a new one.

Fixes: cockroachdb#112509
Release note: none
Epic: none

115649: cluster-ui: remove leading `/` in settings request r=xinhaoz a=xinhaoz

Recently we required that all requests paths from cluster-ui do not start with `/` so that they may be used with any given subpath when constructing proxy requests.

We missed removing the prefix from the `settings` api which requests all cluster settings.

Release note: None

Epic: none

----------------------
Previously, this auto stats enabled label would show disabled even though auto stats collection is enabled. This is because this value comes from the settings request which was not being issued correctly.
https://www.loom.com/share/42847f2c63484a508a128c5808df7bdd

115654: opt: fix insert fast path uniqueness check bug r=mgartner a=mgartner

This commit fixes a bug caused by creating a closure over an
incrementing integer in a loop.

Fixes cockroachdb#115377
Fixes cockroachdb#115378

Release note (bug fix): A bug has been fixed that caused node crashes
and panics when running `INSERT` queries on `REGIONAL BY ROW` tables
with `UNIQUE` constraints or indexes. The bug is only present in
version 23.2.0-beta.1.

Co-authored-by: DarrylWong <darryl@cockroachlabs.com>
Co-authored-by: Xin Hao Zhang <xzhang@cockroachlabs.com>
Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
@DarrylWong
Copy link
Copy Markdown
Contributor Author

blathers backport 23.2 23.1

@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Dec 6, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 11f202c to blathers/backport-release-23.1-115539: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.1 failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

roachtest: test runner shouldn't stop when c.Extend fails

3 participants