Skip to content

roachtest: fix decommission/randomize#55809

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
tbg:fix-decom-random
Oct 27, 2020
Merged

roachtest: fix decommission/randomize#55809
craig[bot] merged 1 commit intocockroachdb:masterfrom
tbg:fix-decom-random

Conversation

@tbg
Copy link
Copy Markdown
Member

@tbg tbg commented Oct 21, 2020

The test could end up using fully decommissioned nodes for cli commands,
which does not work as of #55286.

Fixes #55581.

Release note: None

@tbg tbg requested a review from irfansharif October 21, 2020 12:52
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Oct 21, 2020

Draft because I stressed this locally and it's still whack. Probably will be able to spot the problem via a self-review.

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Oct 21, 2020

The problem is (I assume) that the test wants to see the decommissioned nodes in the node ls output. They won't be there any more since they are decommissioned, and forcibly become non-live pretty soon because they lose contact with the cluster. Will have to massage this test a bit more.

@tbg tbg force-pushed the fix-decom-random branch from a9848da to 040f188 Compare October 27, 2020 09:37
@tbg tbg marked this pull request as ready for review October 27, 2020 09:40
@tbg tbg force-pushed the fix-decom-random branch from 040f188 to c873b96 Compare October 27, 2020 10:27
The test could end up using fully decommissioned nodes for cli commands,
which does not work as of cockroachdb#55286.
Additionally, decommissioned nodes now become non-live after a short
while, so various cli output checks had to be adjusted.

Fixes cockroachdb#55581.

Release note: None
@tbg tbg force-pushed the fix-decom-random branch from c873b96 to a00ffe5 Compare October 27, 2020 10:54
@tbg
Copy link
Copy Markdown
Member Author

tbg commented Oct 27, 2020

Finally good to go, got 50 consecutive iterations in.

// Partially decommission then recommission n1, from another
// random node. Run a couple of status checks while doing so.
// We hard-code n1 to guard against the hypothetical case in which
// targetNode has no replicas (yet) to begin with, which would make
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how long till we'll want to to support a way for operators to only mark a node as decommissioning (and not fully decommissioned). Then we'll have come full circle.

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Oct 27, 2020

TFTR!

bors r=irfansharif

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Oct 27, 2020

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Oct 27, 2020

Build succeeded:

@craig craig bot merged commit bea5339 into cockroachdb:master Oct 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

roachtest: decommission/randomized failed

3 participants