roachtest: avoid decommissioning suspect nodes in mixed version test#106859
Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom Jul 17, 2023
Merged
Conversation
Member
Once decommission pre-checks were introduced, in cockroachdb#98113 the roachtests were updated to handle the new output in certain cases. Despite this, it was not handled in all cases and in the `decommission/mixed-versions` test, which upgrades and restarts nodes, decommission requests that happen shortly after restart could fail the pre-checks because nodes are considered "suspect" for 30s after being unavailable. This change decreases the suspect time limit and ensures that the nodes are considered fully available before decommissioning, with pre-checks enabled. Fixes: cockroachdb#101620 Release note: None
949cffe to
38c9f0b
Compare
shralex
approved these changes
Jul 14, 2023
erikgrinaker
approved these changes
Jul 17, 2023
Contributor
Author
|
bors r+ |
Contributor
|
Build succeeded: |
AlexTalks
added a commit
to AlexTalks/cockroach
that referenced
this pull request
Aug 9, 2023
In cockroachdb#106859, the `decommission/mixed-versions` test was updated to properly support the decommission pre-checks introduced in 23.1, however in doing so there was an inadvertent bug introduced in the test due to the `server.time_after_store_suspect` setting. While this setting can be used to shorten the time a store is considered suspect after node restart, there exists a discrepency in this setting between 23.1 (the current predecessor major version) and 23.2, as 23.2 requires the setting to have a minimum of 10s, otherwise reverting to the default of 30s, despite the fact that this validation is not performed when the setting is actually overridden on the predecessor version. This change corrects that mistake, setting the value to the correct minimum version and waiting out the "suspect" time after restart before attempting decommission. Fixes: cockroachdb#107150. Release note: None
craig bot
pushed a commit
that referenced
this pull request
Aug 10, 2023
108408: roachtest: ensure valid suspect duration in mixed version decommission r=AlexTalks a=AlexTalks In #106859, the `decommission/mixed-versions` test was updated to properly support the decommission pre-checks introduced in 23.1, however in doing so there was an inadvertent bug introduced in the test due to the `server.time_after_store_suspect` setting. While this setting can be used to shorten the time a store is considered suspect after node restart, there exists a discrepency in this setting between 23.1 (the current predecessor major version) and 23.2, as 23.2 requires the setting to have a minimum of 10s, otherwise reverting to the default of 30s, despite the fact that this validation is not performed when the setting is actually overridden on the predecessor version. This change corrects that mistake, setting the value to the correct minimum version and waiting out the "suspect" time after restart before attempting decommission. Fixes: #107150. Release note: None Co-authored-by: Alex Sarkesian <sarkesian@cockroachlabs.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Once decommission pre-checks were introduced, in #98113 the roachtests were updated to handle the new output in certain cases. Despite this, it was not handled in all cases and in the
decommission/mixed-versionstest, which upgrades and restarts nodes, decommission requests that happen shortly after restart could fail the pre-checks because nodes are considered "suspect" for 30s after being unavailable. This change decreases the suspect time limit and ensures that the nodes are considered fully available before decommissioning, with pre-checks enabled.Fixes: #101620
Release note: None