allocation: fix up random long usage with inclusive bound#139163
Merged
schase-es merged 1 commit intoelastic:mainfrom Dec 6, 2025
Conversation
In the write load constraint monitor tests, the criteria for specifying a non-hotspot node generated a random queue latency between 0 and the hotspot threshold setting. In usage, the random number generation used the threshold as an inclusive bound, while it needed to be an exclusive bound. This became an issue recently, with the addition of tests that specify a certain number of hotspot nodes, and remove them individually throughout the test (testHotspotCountTurnsOff). Fixes: elastic#139161
Collaborator
|
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
DiannaHohensee
approved these changes
Dec 6, 2025
| long queueLatencyThresholdMillis, | ||
| int highUtilizationThresholdPercent | ||
| ) { | ||
| assert queueLatencyThresholdMillis > 0 : "queue latency threshold must be positive"; |
Contributor
There was a problem hiding this comment.
I think with this check you've highlighted an issue with the production code :)
If we ever set the threshold to zero, everything would be a hot-spot. So either the canRemain check must be >latency_threshold, or zero must be an invalid latency_threshold value.
Not something to resolve now, though. Filed https://elasticco.atlassian.net/browse/ES-13741.
schase-es
added a commit
to schase-es/elasticsearch
that referenced
this pull request
Jan 8, 2026
…tic#140243) The write load constraint decider test, as originally written, had a threshold error when creating a node that was not hotspotting: roughly one in three thousand nodes would accidentally receive a metric that would test as hot. This was already fixed in elastic#139163. Fixes: elastic#138924
jimczi
pushed a commit
to jimczi/elasticsearch
that referenced
this pull request
Jan 12, 2026
…tic#140243) The write load constraint decider test, as originally written, had a threshold error when creating a node that was not hotspotting: roughly one in three thousand nodes would accidentally receive a metric that would test as hot. This was already fixed in elastic#139163. Fixes: elastic#138924
elasticsearchmachine
pushed a commit
that referenced
this pull request
Mar 4, 2026
…#140243) (#140407) * allocation: unmute failed decider test that was fixed elsewhere (#140243) The write load constraint decider test, as originally written, had a threshold error when creating a node that was not hotspotting: roughly one in three thousand nodes would accidentally receive a metric that would test as hot. This was already fixed in #139163. Fixes: #138924 * Fixing up merge
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In the write load constraint monitor tests, the criteria for specifying a non-hotspot node generated a random queue latency between 0 and the hotspot threshold setting. In usage, the random number generation used the threshold as an inclusive bound, while it needed to be an exclusive bound. This became an issue recently, with the addition of tests that specify a certain number of hotspot nodes, and remove them individually throughout the test (testHotspotCountTurnsOff).
Fixes: #139161