Skip to content

roachtest: fix panic in mixedversion when cockroach fails to start#103799

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
renatolabs:rc/mixedversion-fix-panic-on-startup
May 23, 2023
Merged

roachtest: fix panic in mixedversion when cockroach fails to start#103799
craig[bot] merged 1 commit intocockroachdb:masterfrom
renatolabs:rc/mixedversion-fix-panic-on-startup

Conversation

@renatolabs
Copy link
Copy Markdown

When a mixed-version test (using the mixedversion package) starts, the cockroach binary is uploaded to every node, and then the cluster is started. If, for some reason, the cockroach process crashes in this startup phase, the mixedversion package would panic while generating an error message. The reason for that is that there was an assumption that the connection cache was initialized at that point, which does not hold when the test failure happened on test setup.

This fixes this issue by making sure we check for the status of the connection cache when generating error messages. We also make sure concurrent accesses to the connection cache are safe; while this is not strictly necessary (no concurrent reads and writes to it right now), it will likely help in the future as this code changes.

Epic: none

Release note: None

When a mixed-version test (using the `mixedversion` package) starts,
the `cockroach` binary is uploaded to every node, and then the cluster
is started. If, for some reason, the cockroach process crashes in this
startup phase, the `mixedversion` package would panic while generating
an error message. The reason for that is that there was an assumption
that the connection cache was initialized at that point, which does
not hold when the test failure happened on test setup.

This fixes this issue by making sure we check for the status of the
connection cache when generating error messages. We also make sure
concurrent accesses to the connection cache are safe; while this is
not strictly necessary (no concurrent reads and writes to it right
now), it will likely help in the future as this code changes.

Epic: none

Release note: None
@renatolabs renatolabs requested a review from a team as a code owner May 23, 2023 17:46
@renatolabs renatolabs requested review from herkolategan and srosenberg and removed request for a team May 23, 2023 17:46
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Member

@srosenberg srosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @herkolategan and @renatolabs)


pkg/cmd/roachtest/roachtestutil/mixedversion/runner.go line 310 at r1 (raw file):

	clusterVersionsBefore := tr.clusterVersions
	var clusterVersionsAfter atomic.Value
	if tr.connCacheInitialized() {

Should we log the fact that the failure occurred during the first step, i.e., before initialization finished? Or is it already captured somewhere else?

Copy link
Copy Markdown
Author

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r=srosenberg

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @herkolategan and @srosenberg)


pkg/cmd/roachtest/roachtestutil/mixedversion/runner.go line 310 at r1 (raw file):
The error itself already includes that. Here's the error I see when I forced roachtest to use bad fixtures:

[mixed-version-test/1_starting-cluster-from-fixtures-for-version-22.2.9] 16:28:48 runner.go:328: mixed-version test failure while running step 1 (starting cluster from fixtures for version "22.2.9"): ~ ./cockroach.sh

@craig
Copy link
Copy Markdown
Contributor

craig bot commented May 23, 2023

Build failed:

@renatolabs
Copy link
Copy Markdown
Author

Flaked: #103531 (comment).

bors retry

@craig
Copy link
Copy Markdown
Contributor

craig bot commented May 23, 2023

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants