Skip to content

roachtest: ensure valid suspect duration in mixed version decommission#108408

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
AlexTalks:fix_decommission_upgrade_test
Aug 10, 2023
Merged

roachtest: ensure valid suspect duration in mixed version decommission#108408
craig[bot] merged 1 commit intocockroachdb:masterfrom
AlexTalks:fix_decommission_upgrade_test

Conversation

@AlexTalks
Copy link
Copy Markdown
Contributor

@AlexTalks AlexTalks commented Aug 9, 2023

In #106859, the decommission/mixed-versions test was updated to
properly support the decommission pre-checks introduced in 23.1, however
in doing so there was an inadvertent bug introduced in the test due to
the server.time_after_store_suspect setting. While this setting can be
used to shorten the time a store is considered suspect after node
restart, there exists a discrepency in this setting between 23.1 (the
current predecessor major version) and 23.2, as 23.2 requires the
setting to have a minimum of 10s, otherwise reverting to the default of
30s, despite the fact that this validation is not performed when the
setting is actually overridden on the predecessor version.

This change corrects that mistake, setting the value to the correct
minimum version and waiting out the "suspect" time after restart before
attempting decommission.

Fixes: #107150.

Release note: None

In cockroachdb#106859, the `decommission/mixed-versions` test was updated to
properly support the decommission pre-checks introduced in 23.1, however
in doing so there was an inadvertent bug introduced in the test due to
the `server.time_after_store_suspect` setting. While this setting can be
used to shorten the time a store is considered suspect after node
restart, there exists a discrepency in this setting between 23.1 (the
current predecessor major version) and 23.2, as 23.2 requires the
setting to have a minimum of 10s, otherwise reverting to the default of
30s, despite the fact that this validation is not performed when the
setting is actually overridden on the predecessor version.

This change corrects that mistake, setting the value to the correct
minimum version and waiting out the "suspect" time after restart before
attempting decommission.

Fixes: cockroachdb#107150.

Release note: None
@AlexTalks AlexTalks requested a review from a team as a code owner August 9, 2023 02:36
@AlexTalks AlexTalks requested review from DarrylWong and rachitgsrivastava and removed request for a team August 9, 2023 02:36
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@AlexTalks AlexTalks requested a review from andrewbaptist August 9, 2023 02:37
// NB: The suspect duration must be at least 10s, as versions 23.2 and
// beyond will reset to the default of 30s if it fails validation, even if
// set by a previous version.
const suspectDuration = 10 * time.Second
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You can use the constant liveness.minTimeUntilNodeSuspect - although you would have to make that a public const. It might be easier to see where this comes from and if it changes later.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had sort of assumed we had a preference against importing from internal CRDB packages in roachtests... is that not the case?

Copy link
Copy Markdown

@andrewbaptist andrewbaptist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: - although consider changing to use the constant for the suspect duration.

@AlexTalks
Copy link
Copy Markdown
Contributor Author

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 10, 2023

Build succeeded:

@craig craig bot merged commit a598a75 into cockroachdb:master Aug 10, 2023
@AlexTalks AlexTalks deleted the fix_decommission_upgrade_test branch August 10, 2023 06:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

roachtest: decommission/mixed-versions failed

3 participants