roachtest: de-flake 'inconsistent'#54019
Conversation
pkg/cmd/roachtest/inconsistency.go
Outdated
| // consistency check interval. This makes sure that the consistency check runs | ||
| // against all three nodes. If it targeted only two nodes, a random one would | ||
| // fatal - not what we want. | ||
| time.Sleep(20 * time.Second) |
There was a problem hiding this comment.
I stand behind moving the cluster setting change to after the call for waitForFullReplication()
However I do not understand the 20s wait - isn't waitForFullReplication() sufficient to "wait for the cluster to come together"?
Also, separately, a question: the comment says "this makes sure that the consistency check runs against all three nodes". However the cluster setting change is onyl effected against node 2. Given that cluster settings take a little while to propagate, it looks to me like node 2 is the one that's going to predominently fall over, as it will start its checks before the other two nodes get the setting update. Is that intentional?
There was a problem hiding this comment.
However I do not understand the 20s wait - isn't waitForFullReplication() sufficient to "wait for the cluster to come together"?
You're right that this should not be necessary. If replicas are moved to the node, it is likely (though not guaranteed, but alas) that it is considered live by all. Removed.
Also, separately, a question: the comment says "this makes sure that the consistency check runs against all three nodes". However the cluster setting change is onyl effected against node 2. Given that cluster settings take a little while to propagate, it looks to me like node 2 is the one that's going to predominently fall over, as it will start its checks before the other two nodes get the setting update. Is that intentional?
The consistency check runs against all members of the range. It does not matter which node it gets run on, other than that that node's view of liveness determines who is probed as part of the check. I am choosing n2 here since n1 is the one we expect to die; it may already be dead (though it usually isn't).
This test sets up an intentionally corrupted replica and wants its node to shut down as a result of its detection. When only two of the three nodes were included in the consistency check, either one of them could end up terminating (as no obvious majority of healthy replicas can be determined). Change the test so that we wait for the cluster to come fully together before setting a low consistency check interval. Closes cockroachdb#54005. Release justification: testing Release note: None
|
bors r=knz |
|
Build succeeded: |
This test sets up an intentionally corrupted replica and wants its node
to shut down as a result of its detection. When only two of the three
nodes were included in the consistency check, either one of them could
end up terminating (as no obvious majority of healthy replicas can be
determined). Change the test so that we wait for the cluster to come
fully together before setting a low consistency check interval.
Closes #54005.
Release justification: testing
Release note: None