Skip to content

allocation: fix up random long usage with inclusive bound#139163

Merged
schase-es merged 1 commit intoelastic:mainfrom
schase-es:test-failures/139161-write-load-constraint-monitor-random-long-inclusive
Dec 6, 2025
Merged

allocation: fix up random long usage with inclusive bound#139163
schase-es merged 1 commit intoelastic:mainfrom
schase-es:test-failures/139161-write-load-constraint-monitor-random-long-inclusive

Conversation

@schase-es
Copy link
Copy Markdown
Contributor

In the write load constraint monitor tests, the criteria for specifying a non-hotspot node generated a random queue latency between 0 and the hotspot threshold setting. In usage, the random number generation used the threshold as an inclusive bound, while it needed to be an exclusive bound. This became an issue recently, with the addition of tests that specify a certain number of hotspot nodes, and remove them individually throughout the test (testHotspotCountTurnsOff).

Fixes: #139161

In the write load constraint monitor tests, the criteria for specifying a
non-hotspot node generated a random queue latency between 0 and the hotspot
threshold setting. In usage, the random number generation used the threshold as
an inclusive bound, while it needed to be an exclusive bound. This became an
issue recently, with the addition of tests that specify a certain number of
hotspot nodes, and remove them individually throughout the test
(testHotspotCountTurnsOff).

Fixes: elastic#139161
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v9.3.0 labels Dec 5, 2025
@schase-es schase-es added :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >test Issues or PRs that are addressing/adding tests Team:Distributed Coordination (obsolete) Meta label for Distributed Coordination team. Obsolete. Please do not use. and removed needs:triage Requires assignment of a team area label labels Dec 5, 2025
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

Copy link
Copy Markdown
Contributor

@DiannaHohensee DiannaHohensee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you shook loose a pre-existing bug :) Lgtm 👍

long queueLatencyThresholdMillis,
int highUtilizationThresholdPercent
) {
assert queueLatencyThresholdMillis > 0 : "queue latency threshold must be positive";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with this check you've highlighted an issue with the production code :)

If we ever set the threshold to zero, everything would be a hot-spot. So either the canRemain check must be >latency_threshold, or zero must be an invalid latency_threshold value.

Not something to resolve now, though. Filed https://elasticco.atlassian.net/browse/ES-13741.

@schase-es schase-es merged commit dbde218 into elastic:main Dec 6, 2025
34 checks passed
schase-es added a commit that referenced this pull request Jan 8, 2026
)

The write load constraint decider test, as originally written, had a threshold
error when creating a node that was not hotspotting: roughly one in three
thousand nodes would accidentally receive a metric that would test as hot. This
was already fixed in #139163.

Fixes: #138924
schase-es added a commit to schase-es/elasticsearch that referenced this pull request Jan 8, 2026
…tic#140243)

The write load constraint decider test, as originally written, had a threshold
error when creating a node that was not hotspotting: roughly one in three
thousand nodes would accidentally receive a metric that would test as hot. This
was already fixed in elastic#139163.

Fixes: elastic#138924
jimczi pushed a commit to jimczi/elasticsearch that referenced this pull request Jan 12, 2026
…tic#140243)

The write load constraint decider test, as originally written, had a threshold
error when creating a node that was not hotspotting: roughly one in three
thousand nodes would accidentally receive a metric that would test as hot. This
was already fixed in elastic#139163.

Fixes: elastic#138924
elasticsearchmachine pushed a commit that referenced this pull request Mar 4, 2026
…#140243) (#140407)

* allocation: unmute failed decider test that was fixed elsewhere (#140243)

The write load constraint decider test, as originally written, had a threshold
error when creating a node that was not hotspotting: roughly one in three
thousand nodes would accidentally receive a metric that would test as hot. This
was already fixed in #139163.

Fixes: #138924

* Fixing up merge
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed Coordination (obsolete) Meta label for Distributed Coordination team. Obsolete. Please do not use. >test Issues or PRs that are addressing/adding tests v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI] WriteLoadConstraintMonitorTests testHotspotCountTurnsOff failing

3 participants