Skip to content

Add an option to return early from an allocate call#134786

Merged
elasticsearchmachine merged 14 commits intoelastic:mainfrom
ywangd:ES-12862-return-early-from-allocate
Sep 19, 2025
Merged

Add an option to return early from an allocate call#134786
elasticsearchmachine merged 14 commits intoelastic:mainfrom
ywangd:ES-12862-return-early-from-allocate

Conversation

@ywangd
Copy link
Copy Markdown
Member

@ywangd ywangd commented Sep 16, 2025

Instead of running simulation all the way to balance or no possible movement, this PR adds an option to finish early based on the number of relocating shards. Note this early return mechanism is enabled only when desired balance is in use, i.e. it affects simulation only.

Resolves: ES-12862

Instead of running simulation all the way to balance or no possible
movement, this PR adds an option to finish early based on the number of
relocating shards.

Resolves: ES-12862
@ywangd ywangd added >non-issue :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v9.2.0 labels Sep 16, 2025
@ywangd
Copy link
Copy Markdown
Member Author

ywangd commented Sep 16, 2025

Note to reviewers: This is a draft PR to ensure that I am on the right track and align with other relevant changes. Your feedback is appreciated! 🙏

Copy link
Copy Markdown
Contributor

@DiannaHohensee DiannaHohensee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some feedback/questions.

I'm primarily concerned with the ability to configure the number of relocations before returning. The purpose of this change is to avoid ever encountering THROTTLE answers that can negatively interact with NOT_PREFERRED, resulting in allocation of a shard to NOT_PREFERRED because all the other nodes returned THROTTLE (not the decision we want to make). If the number of relocations is configurable, then we can run into throttling after 2 relocations from the ThrottlingAllocationDecider and/or ConcurrentRebalanceAllocationDecider -- maybe I'm missing other THROTTLE cases.

Additionally, maybe we can sprinkle some assertions around the balancer code that we never encounter a THROTTLE answer? It would be bad if we did 😬

if (allocation.hasTargetRelocatingShards()) {
// The shard may have been relocated on the ModelNode, but not on the cluster (RoutingNodes) if
// it is throttled. The `hasTargetRelocatingShards` check does not account for this case.
// This might be a non-issue since our intention is to have no Throttle with early returns.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The supposed contract of tryRelocateShard, according to the method comment, is that true means the move was executed "on the cluster" 🤔 Pre-existing outdated comment, seems like.

How about we update this code

canAllocateOrRebalance == Type.YES
/* only allocate on the cluster if we are not throttled */
? routingNodes.relocateShard(shard, minNode.getNodeId(), shardSize, "rebalance", allocation.changes()).v1()
: shard.relocate(minNode.getNodeId(), shardSize)
to assert a non-throttling answer and always do the routingNodes relocate call? With your complete code changes for this task, I would never expect the balancer to run into THROTTLE again -- we're trying to avoid THROTTLE situations, which can lead to overriding not-preferred answers.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah added such an assertion in the caller, see dbd0fed

@ywangd
Copy link
Copy Markdown
Member Author

ywangd commented Sep 17, 2025

The PR is now updated based on the latest discussion, specifically:

  • Whether to return early depends only on the balancer type and not otherwise configurable
  • Relevant methods now return boolean to indicate movements/assignements
  • Added a few more assertions
  • Fixed tests which depend on the BalancedShardAllocator behaviours.

Your comments are appreciated. Thank you!

Copy link
Copy Markdown
Contributor

@nicktindall nicktindall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +143 to +148
assert allocation.isSimulating() == false || balancerSettings.completeEarlyOnShardAssignmentChange()
: "inconsistent states: isSimulating ["
+ allocation.isSimulating()
+ "] vs completeEarlyOnShardAssignmentChange ["
+ balancerSettings.completeEarlyOnShardAssignmentChange()
+ "]";
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the production code flows, the BalancedShardsAllocator should only see simulating RoutingAllocation when DesiredBalanceAllocator is in use. Otherwise it should always see a non-simulating RoutingAllocation. Therefore, we could potentially just use allocation.isSimulating() and avoid the need for balancerSettings.completeEarlyOnShardAssignmentChange(). But I decided to keep it for now because:

  1. They are conceptually separate things
  2. Instead of ensuring them being identical at all times, we only really need to ensure simulating RoutingAllocation can only be handled by a balancer with completeEarlyOnShardAssignmentChange() == true, i.e. what the assertion checks.
  3. It's easy to remove completeEarlyOnShardAssignmentChange in future when we are sure there is no any exception (production or test). For now, it feels safer with it.

@ywangd ywangd marked this pull request as ready for review September 18, 2025 01:49
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Coordination (obsolete) Meta label for Distributed Coordination team. Obsolete. Please do not use. label Sep 18, 2025
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

@ywangd ywangd requested a review from nicktindall September 18, 2025 01:50
@ywangd
Copy link
Copy Markdown
Member Author

ywangd commented Sep 18, 2025

@nicktindall I added a test in bd00382. It would be great if you could give the PR another look. Thanks a lot!

Copy link
Copy Markdown
Contributor

@nicktindall nicktindall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test LGTM also!

Copy link
Copy Markdown
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@ywangd ywangd added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Sep 18, 2025
@elasticsearchmachine elasticsearchmachine merged commit 43741c2 into elastic:main Sep 19, 2025
34 checks passed
@ywangd ywangd deleted the ES-12862-return-early-from-allocate branch September 19, 2025 02:13
DiannaHohensee added a commit to DiannaHohensee/elasticsearch that referenced this pull request Sep 25, 2025
ywangd added a commit to ywangd/elasticsearch that referenced this pull request Sep 29, 2025
elasticsearchmachine pushed a commit that referenced this pull request Sep 29, 2025
This PR reapply #134786 with a bug fix which previously caused many test
failures. 

Resolves: #135406 Resolves: #135407  Resolves: #135150  Resolves:
#135151  Resolves: #135194  Resolves: #135248  Resolves: #135249 
Resolves: #135408  Resolves: #135473  Resolves: #135474

The following is the original commit message Instead of running
simulation all the way to balance or no possible movement, this PR adds
an option to finish early based on the number of relocating shards. Note
this early return mechanism is enabled only when desired balance is in
use, i.e. it affects simulation only.

Resolves: ES-12862
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >non-issue Team:Distributed Coordination (obsolete) Meta label for Distributed Coordination team. Obsolete. Please do not use. v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants