Fix NOT_STARTED statuses appearing inappropirately during node shutdown#75750
Fix NOT_STARTED statuses appearing inappropirately during node shutdown#75750AthenaEryma merged 23 commits intoelastic:masterfrom
NOT_STARTED statuses appearing inappropirately during node shutdown#75750Conversation
NOT_STARTED statuses appearing when they shouldn't during node shutdownNOT_STARTED statuses appearing inappropirately during node shutdown
|
Pinging @elastic/es-core-infra (Team:Core/Infra) |
dakrone
left a comment
There was a problem hiding this comment.
Thanks Gordon, I left a few comments but nothing major
| import java.util.function.Function; | ||
| import java.util.stream.Collectors; | ||
|
|
||
| public class ShutdownService implements ClusterStateListener { |
There was a problem hiding this comment.
Now that I look at it, this class name is a little strange since this service doesn't actually do anything related to shutting down the nodes.
I wonder if it would be better named something like NodeSeenService?
There was a problem hiding this comment.
Renamed! Good call, I didn't really think about the naming at the time.
x-pack/plugin/shutdown/src/main/java/org/elasticsearch/xpack/shutdown/ShutdownService.java
Show resolved
Hide resolved
| .values() | ||
| .stream() | ||
| .map(singleNodeShutdownMetadata -> { | ||
| if (nodesNotPreviouslySeen.contains(singleNodeShutdownMetadata.getNodeId())) { |
There was a problem hiding this comment.
I think we should recalculate the nodesNotPreviouslySeen since there might be other changes since the time this cluster state update task actually runs
| return singleNodeShutdownMetadata; | ||
| }) | ||
| .collect(Collectors.toUnmodifiableMap(SingleNodeShutdownMetadata::getNodeId, Function.identity())); | ||
|
|
There was a problem hiding this comment.
Would it be worth it to add if (newShutdownMetadataMap.equals(shutdownMetadata.getAllNodeMetadataMap()) { return currentState; } to this so we avoid work if not necessary?
|
Build failures are both timeouts in tests that appear unrelated and don't reproduce locally. @elasticmachine run elasticsearch-ci/bwc |
dakrone
left a comment
There was a problem hiding this comment.
LGTM, I left one pretty minor comment
| import java.util.function.Function; | ||
| import java.util.stream.Collectors; | ||
|
|
||
| public class ShutdownService implements ClusterStateListener { |
There was a problem hiding this comment.
Now that I look at it, this class name is a little strange since this service doesn't actually do anything related to shutting down the nodes.
I wonder if it would be better named something like NodeSeenService?
|
Build failures are due to a compilation error on @elasticmachine run elasticsearch-ci/bwc |
…down (elastic#75750) This PR fixes two situations where `NOT_STARTED` can appear as the shard migration status inappropriately: 1. When the node is actually shut down after having all the shards migrate away. 2. When a non-data-node is registered for shutdown. It also adds tests to ensure these cases are handled correctly.
💚 Backport successful
|
…down (#75750) (#76634) * Fix `NOT_STARTED` statuses appearing inappropirately during node shutdown (#75750) This PR fixes two situations where `NOT_STARTED` can appear as the shard migration status inappropriately: 1. When the node is actually shut down after having all the shards migrate away. 2. When a non-data-node is registered for shutdown. It also adds tests to ensure these cases are handled correctly. * Fix compilation for backport * Fix compilation for backport
…down (elastic#75750) (elastic#76634) * Fix `NOT_STARTED` statuses appearing inappropirately during node shutdown (elastic#75750) This PR fixes two situations where `NOT_STARTED` can appear as the shard migration status inappropriately: 1. When the node is actually shut down after having all the shards migrate away. 2. When a non-data-node is registered for shutdown. It also adds tests to ensure these cases are handled correctly. * Fix compilation for backport * Fix compilation for backport
…down (#75750) (#76634) (#76669) * Fix `NOT_STARTED` statuses appearing inappropirately during node shutdown (#75750) This PR fixes two situations where `NOT_STARTED` can appear as the shard migration status inappropriately: 1. When the node is actually shut down after having all the shards migrate away. 2. When a non-data-node is registered for shutdown. It also adds tests to ensure these cases are handled correctly. * Fix compilation for backport * Fix compilation for backport
This PR fixes two situations where
NOT_STARTEDcan appear as the shard migration status inappropriately:It also adds tests to ensure these cases are handled correctly.
Follow-up to #75606