Skip to content

[9.2] [ML] Fix: required_native_memory_bytes Calculated with Wrong Allocation Count (#143077)#143295

Merged
elasticsearchmachine merged 1 commit intoelastic:9.2from
valeriy42:backport/9.2/pr-143077
Feb 27, 2026
Merged

[9.2] [ML] Fix: required_native_memory_bytes Calculated with Wrong Allocation Count (#143077)#143295
elasticsearchmachine merged 1 commit intoelastic:9.2from
valeriy42:backport/9.2/pr-143077

Conversation

@valeriy42
Copy link
Copy Markdown
Contributor

Backports the following commits to 9.2:

…on Count (elastic#143077)

The trained model stats API was computing `required_native_memory_bytes` incorrectly. Since elastic#98139, memory estimation has depended on the number of allocations, but the code used the total allocation count across all deployments instead of each deployment’s own count. As a result, when multiple NLP models were deployed with different allocation counts, every model’s `required_native_memory_bytes` was based on the same summed value, so changing allocations for one model incorrectly changed the reported memory for others. This only affected the Stats API output, not actual deployment behavior.

The fix computes `required_native_memory_bytes` per deployment using each deployment’s allocation count. `TransportGetTrainedModelsStatsAction` now passes `Map<String, AssignmentStats>` into `modelSizeStats()` instead of a summed allocation count, and `modelSizeStats()` emits per-deployment entries keyed by `deploymentId` with the correct `numberOfAllocations`. `GetTrainedModelsStatsAction.Response.Builder.build()` looks up model size stats by `deploymentId` first and falls back to `modelId` for undeployed or non-PyTorch models. Unit tests were added to cover per-deployment resolution, undeployed models, and the fallback path.


Fixes elastic#107831
@valeriy42 valeriy42 added :ml Machine learning >bug auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport Team:ML Meta label for the ML team labels Feb 27, 2026
@elasticsearchmachine elasticsearchmachine merged commit 19063a7 into elastic:9.2 Feb 27, 2026
35 checks passed
@valeriy42 valeriy42 deleted the backport/9.2/pr-143077 branch February 27, 2026 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport >bug :ml Machine learning Team:ML Meta label for the ML team v9.2.7

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants