Refactor SnapshotResiliencyTests#140152
Conversation
Also expand DeterministicTaskQueue
…iency-tests-to-stateless
…iency-tests-to-stateless
Also support waiting for a condition before starting the restart node.
…iency-tests-to-stateless
…iency-tests-to-stateless
…-resiliency-tests-to-stateless
|
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
DaveCTurner
left a comment
There was a problem hiding this comment.
Looks broadly good to me, just a few questions/nits.
| } | ||
| return true; | ||
| }).sorted(Comparator.comparing(n -> n.node.getName())).toList(); | ||
| return dataNodes.isEmpty() ? Optional.empty() : Optional.ofNullable(ESTestCase.randomFrom(dataNodes)); |
There was a problem hiding this comment.
nit: slight preference for qualified imports rather than qualifying the method calls
| return dataNodes.isEmpty() ? Optional.empty() : Optional.ofNullable(ESTestCase.randomFrom(dataNodes)); | |
| return dataNodes.isEmpty() ? Optional.empty() : Optional.ofNullable(randomFrom(dataNodes)); |
| } | ||
| } | ||
|
|
||
| public static class TestClusterNode { |
There was a problem hiding this comment.
Why raise this to a first-level inner static class rather than remaining an inner class of TestClusterNodes? I'm guessing it's so that we can subclass it later? But if you're in a subclass of TestClusterNodes can you not subclass TestClusterNode? Haven't tried it, just curious really.
There was a problem hiding this comment.
I sorta generally prefer top level classes. But it is also ok to keep it as an inner class of TestClusterNodes. We can still subclass it as you suggested which is what happened for DeterministicQueue#DeterministicThreadPool. So I changed it accordingly in 9d6806d
Since it touches the constructor, the change also moved both nodeSettings() and createPluginsService() methods to TestClusterNodes so that we can remove the this-escape.
There was a problem hiding this comment.
Let me know if you want TestClusterNodes to be put back into SnapshotResiliencyTests. I'd like to still make it a static class in that case. That said, I find it overall more manageable with two separate files since both of them have quite large sizes and we could potentially use the test cluster for other things in future.
| actions.put( | ||
| TransportClusterHealthAction.TYPE, |
There was a problem hiding this comment.
Do we need to register this action now? Seems like a little more than a pure refactor if so. If not, can we delay it until it's needed?
There was a problem hiding this comment.
I removed this action registration along with the doInit method in 1dbc330
| ) | ||
| ); | ||
|
|
||
| doInit(actions, actionFilters); |
There was a problem hiding this comment.
Likewise here, this hook doesn't do anything yet and adding it will be relatively noise-free later.
|
|
||
| private Coordinator coordinator; | ||
|
|
||
| @SuppressWarnings("this-escape") |
There was a problem hiding this comment.
Does this really allow a this to escape, given that we now have a separate init() method?
There was a problem hiding this comment.
Sorry I see now, I thought it picked this up earlier in the precommit run than it did. The issue is the calls to the instance methods nodeSettings() and createPluginsService(), which do not leak this here but might do so in a subclass I guess. But I wonder, do we need these to be overridable by subclasses or could we pass them in as contructor parameters instead?
|
|
||
| runUntil(documentCountVerified::get, TimeUnit.MINUTES.toMillis(5L)); | ||
| assertNotNull(safeResult(createSnapshotResponseListener)); | ||
| assertNotNull(safeResult(restoreSnapshotResponseListener)); |
There was a problem hiding this comment.
Hmm do we need to drop this assertion?
There was a problem hiding this comment.
It was a mistake. Restored in 8970408 Thanks for catching it.
| .filter(ShardRouting::primary) | ||
| .findFirst() | ||
| .orElseThrow(() -> new AssertionError("no primary shard found")); |
There was a problem hiding this comment.
Slight preference for deferring this change in behaviour.
| null, | ||
| emptySet() | ||
| ); | ||
| // TODO: The indexNameExpressionResolver does not use the same threadContext and projectResolver |
There was a problem hiding this comment.
Is this important? It sounds important...
There was a problem hiding this comment.
Practically it is not important since the tests do not rely on either multi-project or system indices. I didn't touch it because there are other inconsistencies with projectResolver and I was not sure whether it was the right PR to fix them. Since we now have these changes separately. I pushed 4b5d66a to fix them.
Also removed maybeExecuteTransportRunnable
Removed this-escape and some adjustments for constructor
|
@DaveCTurner This PR is ready for another look when you have time. Thanks! |
|
@DaveCTurner Sorry to ping you again. Your review is appreciated when convenient. Thanks a lot! |
DaveCTurner
left a comment
There was a problem hiding this comment.
Looks good now, thanks Yang
|
|
||
| continueOrDie(clusterStateResponseStepListener, clusterStateResponse -> { | ||
| final ShardRouting shardToRelocate = clusterStateResponse.getState().routingTable().allShards(index).get(0); | ||
| ; |
There was a problem hiding this comment.
nit: looks like a typo here
Refactor for `SnapshotResiliencyTests` to make it easier to extend. The main changes are (1) extract TestCluster related code into its own file and (2) add a separate init phase to the test node so that it can be customized by future subclasses. It also extracts serveral protected methods from existing code so they can be customized in future as well. Relates: elastic#138333
Refactor for
SnapshotResiliencyTeststo make it easier to extend. The main changes are (1) extract TestCluster related code into its own file and (2) add a separate init phase to the test node so that it can be customized by future subclasses. It also extracts serveral protected methods from existing code so they can be customized in future as well.Relates: #138333