[ML] prefer least allocated model when a new node is added to the cluster#77756
Conversation
|
Pinging @elastic/ml-core (Team:ML) |
| .entrySet() | ||
| .stream() | ||
| .filter(entry -> entry.getValue().getAllocationState().equals(AllocationState.STOPPING) == false) | ||
| .sorted(Comparator.comparing(e -> e.getValue().getNodeRoutingTable().size())) |
There was a problem hiding this comment.
This is obviously not optimal, but I am not 100% sure how else to do it.
One issue here (besides performance), is that we always use the counts based on the original cluster state. Meaning, as models are allocated, our iteration order isn't updated in the same pass.
This is probably acceptable.
Another option is that we maybe iterate on nodes in the outer loop? This way all allocations attempt the same node and then we move on? The only downside here is that the order of inner iteration may change all the time.
Another option is attempting this same thing with a priority queue that updates on each iteration, popping off all the least allocations and rebuilding each time, but this seems pretty expensive to me as the whole heap would be rebuit on every pass on a new node...
There was a problem hiding this comment.
A low hanging fruit here is to apply the filter of StartTrainedModelDeploymentAction.TaskParams.mayAllocateToNode at the point where we collect currentNotShuttingDownNodes. So that would become currentEligibleNodes for instance and it'd be non-shutting down, ML nodes of an appropriate version. This would significantly reduce one of the parameters here for most clusters.
Not sure I can think of something else from a first look. But if we make this optimization, then we can run a few tests to see how long it takes to run this with realistic parameter values. It might not be a problem at all.
| logger.trace( | ||
| () -> new ParameterizedMessage("[{}] [{}] current metadata before update {}", modelId, nodeId, Strings.toString(metadata)) | ||
| ); | ||
| Set<String> shuttingDownNodes = nodesShuttingDown(currentState); |
There was a problem hiding this comment.
This is old, unused code, decided to clean it up. Unrelated to the change in this PR.
| StartTrainedModelDeploymentAction.TaskParams params, | ||
| DiscoveryNode node | ||
| ) { | ||
| NodeLoad load = builder.isChanged() |
There was a problem hiding this comment.
This behaviour might be surprising or cause a subtle bug as if you pass the builder (this is an overloaded method) you expect it to be used. I can't think of reason not to call the builder.build() did you have something in mind?
There was a problem hiding this comment.
I can't think of reason not to call the builder.build() did you have something in mind?
It just seems unnecessary if nothing changed. The object could be pretty big and constructing a brand new object (deeply copying all the hashtables), could be expensive in the case when a node is simply leaving the cluster
There was a problem hiding this comment.
It works fine in this case because we know it is ok to fall back on the TrainedModelAllocationMetadata in ClusterState state but given a few refactors and a little bit of time that assumption may no longer hold and lead unexpected behaviour.
Perhaps it is not as neat but you could test builder.isChanged() outside this function and then call the correct overload or at least add a comment
There was a problem hiding this comment.
if you pass the builder (this is an overloaded method) you expect it to be used
I suppose it is a sneaky switch. I will update it to be more clear so other callers of the method aren't surprised.
| && modelAllocationEntry.getValue().isRoutedToNode(node.getId()) == false) { | ||
| Optional<String> failure = nodeHasCapacity( | ||
| currentState, | ||
| builder, |
There was a problem hiding this comment.
good catch using the updated allocations here
…tion-prefers-least-allocated-on-new-node
|
run elasticsearch-ci/packaging-tests-windows-sample |
* master: (185 commits) Implement get and containsKey in terms of the wrapped innerMap (elastic#77965) Adjust Lucene version and enable BWC tests (elastic#77933) Disable BWC to upgrade to Lucene-8.10-snapshot Reenable MlDistributedFailureIT [DOCS] Fix typo for `script.painless.regex.enabled` setting value (elastic#77853) Upgrade to Lucene-8.10.0-snapshot-bf2fcb53079 (elastic#77801) [DOCS] Fix ESS install lead-in (elastic#77887) Resolve thirdparty gradle plugin artifacts from mavencentral (elastic#77865) Reduce the number of times that `LifecycleExecutionState` is parsed when running a policy. (elastic#77863) Utility methods to add and remove backing indices from data streams (elastic#77778) Use Objects.equals() instead of == to compare strings (elastic#77840) [ML] prefer least allocated model when a new node is added to the cluster (elastic#77756) Deprecate ignore_throttled parameter (elastic#77479) Improve LifecycleExecutionState parsing. (elastic#77855) [DOCS] Removes deprecated word from HLRC title. (elastic#77851) Remove legacy geo code from AggregationResultUtils (elastic#77702) Adjust SearchableSnapshotsBlobStoreCacheIntegTests.testBlobStoreCache (elastic#77758) Laxify SecureSM to allow creation of the JDK's innocuous threads (elastic#77789) [Test] Reduce concurrency when testing creation of security index (elastic#75293) Refactor metric PipelineAggregation integration test (elastic#77548) ... # Conflicts: # server/src/main/java/org/elasticsearch/cluster/metadata/DataStream.java
When a new node is added to the cluster, we should first attempt to allocate models that have fewer
allocated nodes.
This way we prioritize getting allocations up and running vs. using a random selection.