[ML] Add per allocation and per deployment memory metadata fields to the trained models config by valeriy42 · Pull Request #98139 · elastic/elasticsearch

valeriy42 · 2023-08-02T14:04:58Z

To improve the required memory estimation of NLP models, this PR introduces two new metadata fields: per_deployment_memory_bytes and per_allocation_memory_bytes.

per_deployment_memory_bytes is the memory required to load the model in the deployment
per_allocation_memory_bytes is the temporary additional memory used during the inference for every allocation.

This PR extends the memory usage estimation logic while ensuring backward compatibility.

In a follow-up PR, I will adjust the assignment planner to use the refined memory usage information.

…parse-new-memory-fields

valeriy42 · 2023-08-16T11:00:56Z

Thank you for the review @davidkyle . I updated the code as you suggested. It would be great if you could take another pass.

davidkyle

LGTM

…parse-new-memory-fields

valeriy42 · 2023-08-21T13:03:37Z

@elasticmachine, run elasticsearch-ci/part-1

valeriy42 · 2023-08-21T13:04:03Z

@elasticmachine, run elasticsearch-ci/part-2

droberts195 · 2023-10-09T19:22:21Z

.../plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/inference/TrainedModelConfig.java


+    public long getPerDeploymentMemoryBytes() {
+        return metadata != null && metadata.containsKey(PER_DEPLOYMENT_MEMORY_BYTES.getPreferredName())
+            ? ((Number) metadata.get(PER_DEPLOYMENT_MEMORY_BYTES.getPreferredName())).longValue()


Is it possible to create a model that makes this throw a ClassCastException by setting this field to a string in the metadata?

Also, there's no protection against negative numbers here.

I know this PR was merged ages ago, but I think it's worth adding the extra protection in a followup, as invalid configs living inside the cluster can cause big problems later on.

Since the code has been shipped without protection these reader methods will just have to return 0 in place of strings, lists, maps or negative numbers. The Elastic "no surprises" philosophy would imply we should have separate methods that get used when a new config is put that validate the fields and throw exceptions if the fields exist and are not non-negative numbers. But that validation can only be on initial put of the config.

…signment planner (#98874) Building upon #98139, this PR extends the model assignment planning algorithms and the linear solver to use the extended memory fields. It also adds unit tests to verify the new behavior. I needed to adjust the old unit tests since we use the estimateMemoryUsage routine, which would compute 2*memoryBytes + 240 MB as the memory requirement. Previously, in the unit tests, we were simply using memoryBytes field value.

…signment planner (elastic#98874) Building upon elastic#98139, this PR extends the model assignment planning algorithms and the linear solver to use the extended memory fields. It also adds unit tests to verify the new behavior. I needed to adjust the old unit tests since we use the estimateMemoryUsage routine, which would compute 2*memoryBytes + 240 MB as the memory requirement. Previously, in the unit tests, we were simply using memoryBytes field value.

…dates This commit reverts changes to the memory usage estimation logic introduced by PR elastic#98139, which caused failures when updating the `number_of_allocations` for trained model deployments. The reversion restores the system's stability in high-availability environments. Relates elastic#107807

…on Count (#143077) The trained model stats API was computing `required_native_memory_bytes` incorrectly. Since #98139, memory estimation has depended on the number of allocations, but the code used the total allocation count across all deployments instead of each deployment’s own count. As a result, when multiple NLP models were deployed with different allocation counts, every model’s `required_native_memory_bytes` was based on the same summed value, so changing allocations for one model incorrectly changed the reported memory for others. This only affected the Stats API output, not actual deployment behavior. The fix computes `required_native_memory_bytes` per deployment using each deployment’s allocation count. `TransportGetTrainedModelsStatsAction` now passes `Map<String, AssignmentStats>` into `modelSizeStats()` instead of a summed allocation count, and `modelSizeStats()` emits per-deployment entries keyed by `deploymentId` with the correct `numberOfAllocations`. `GetTrainedModelsStatsAction.Response.Builder.build()` looks up model size stats by `deploymentId` first and falls back to `modelId` for undeployed or non-PyTorch models. Unit tests were added to cover per-deployment resolution, undeployed models, and the fallback path. Fixes #107831

…on Count (elastic#143077) The trained model stats API was computing `required_native_memory_bytes` incorrectly. Since elastic#98139, memory estimation has depended on the number of allocations, but the code used the total allocation count across all deployments instead of each deployment’s own count. As a result, when multiple NLP models were deployed with different allocation counts, every model’s `required_native_memory_bytes` was based on the same summed value, so changing allocations for one model incorrectly changed the reported memory for others. This only affected the Stats API output, not actual deployment behavior. The fix computes `required_native_memory_bytes` per deployment using each deployment’s allocation count. `TransportGetTrainedModelsStatsAction` now passes `Map<String, AssignmentStats>` into `modelSizeStats()` instead of a summed allocation count, and `modelSizeStats()` emits per-deployment entries keyed by `deploymentId` with the correct `numberOfAllocations`. `GetTrainedModelsStatsAction.Response.Builder.build()` looks up model size stats by `deploymentId` first and falls back to `modelId` for undeployed or non-PyTorch models. Unit tests were added to cover per-deployment resolution, undeployed models, and the fallback path. Fixes elastic#107831

…on Count (#143077) (#143295) The trained model stats API was computing `required_native_memory_bytes` incorrectly. Since #98139, memory estimation has depended on the number of allocations, but the code used the total allocation count across all deployments instead of each deployment’s own count. As a result, when multiple NLP models were deployed with different allocation counts, every model’s `required_native_memory_bytes` was based on the same summed value, so changing allocations for one model incorrectly changed the reported memory for others. This only affected the Stats API output, not actual deployment behavior. The fix computes `required_native_memory_bytes` per deployment using each deployment’s allocation count. `TransportGetTrainedModelsStatsAction` now passes `Map<String, AssignmentStats>` into `modelSizeStats()` instead of a summed allocation count, and `modelSizeStats()` emits per-deployment entries keyed by `deploymentId` with the correct `numberOfAllocations`. `GetTrainedModelsStatsAction.Response.Builder.build()` looks up model size stats by `deploymentId` first and falls back to `modelId` for undeployed or non-PyTorch models. Unit tests were added to cover per-deployment resolution, undeployed models, and the fallback path. Fixes #107831

…on Count (#143077) (#143294) The trained model stats API was computing `required_native_memory_bytes` incorrectly. Since #98139, memory estimation has depended on the number of allocations, but the code used the total allocation count across all deployments instead of each deployment’s own count. As a result, when multiple NLP models were deployed with different allocation counts, every model’s `required_native_memory_bytes` was based on the same summed value, so changing allocations for one model incorrectly changed the reported memory for others. This only affected the Stats API output, not actual deployment behavior. The fix computes `required_native_memory_bytes` per deployment using each deployment’s allocation count. `TransportGetTrainedModelsStatsAction` now passes `Map<String, AssignmentStats>` into `modelSizeStats()` instead of a summed allocation count, and `modelSizeStats()` emits per-deployment entries keyed by `deploymentId` with the correct `numberOfAllocations`. `GetTrainedModelsStatsAction.Response.Builder.build()` looks up model size stats by `deploymentId` first and falls back to `modelId` for undeployed or non-PyTorch models. Unit tests were added to cover per-deployment resolution, undeployed models, and the fallback path. Fixes #107831

…on Count (elastic#143077) The trained model stats API was computing `required_native_memory_bytes` incorrectly. Since elastic#98139, memory estimation has depended on the number of allocations, but the code used the total allocation count across all deployments instead of each deployment’s own count. As a result, when multiple NLP models were deployed with different allocation counts, every model’s `required_native_memory_bytes` was based on the same summed value, so changing allocations for one model incorrectly changed the reported memory for others. This only affected the Stats API output, not actual deployment behavior. The fix computes `required_native_memory_bytes` per deployment using each deployment’s allocation count. `TransportGetTrainedModelsStatsAction` now passes `Map<String, AssignmentStats>` into `modelSizeStats()` instead of a summed allocation count, and `modelSizeStats()` emits per-deployment entries keyed by `deploymentId` with the correct `numberOfAllocations`. `GetTrainedModelsStatsAction.Response.Builder.build()` looks up model size stats by `deploymentId` first and falls back to `modelId` for undeployed or non-PyTorch models. Unit tests were added to cover per-deployment resolution, undeployed models, and the fallback path. Fixes elastic#107831

valeriy42 added 2 commits August 2, 2023 14:25

per allocation and deployment memory field added

eb51cca

Add new memory fields to the trained model config

1bc3d64

valeriy42 added WIP :ml Machine learning v8.10.0 labels Aug 2, 2023

valeriy42 marked this pull request as draft August 2, 2023 14:05

valeriy42 added 15 commits August 2, 2023 16:21

fix bwc test failures

6b5c1f0

Add new memory fields to ModelPackageConfig

cd053d3

Merge branch 'main' of https://github.com/elastic/elasticsearch into …

18ded49

…parse-new-memory-fields

Merge branch 'main' of https://github.com/elastic/elasticsearch into …

bfd8241

…parse-new-memory-fields

add memory fields to trained model size stats

78c7b22

Get Trained modes size working

2f2d056

wire memory estimation

0b68687

Merge branch 'main' of https://github.com/elastic/elasticsearch into …

5bbef44

…parse-new-memory-fields

fix spotless failure

fdc7ff0

fix compile error

ed0fc1b

remote new memory size from model stats

956b3cf

fix issue with long lines

4cf34c9

spotless

84d5266

pytorchmodelIIT fixed

f3e42d7

fix bwc test failure

92316ea

valeriy42 added v8.11.0 >enhancement and removed v8.10.0 labels Aug 15, 2023

valeriy42 changed the title ~~[ML] Add memory size field to the trained models config~~ [ML] Add per allocation and per deployment memory metadata fields to the trained models config Aug 15, 2023

valeriy42 added 2 commits August 15, 2023 11:13

Changelog entry

681e0b2

Remove debug logging

bfae071

valeriy42 marked this pull request as ready for review August 15, 2023 10:44

Delete docs/changelog/98139.yaml

255aa3e

davidkyle self-requested a review August 15, 2023 11:02

davidkyle approved these changes Aug 16, 2023

View reviewed changes

valeriy42 added 5 commits August 17, 2023 10:37

Merge branch 'main' of https://github.com/elastic/elasticsearch into …

d35cb54

…parse-new-memory-fields

spotless

9cb492d

Merge branch 'main' of https://github.com/elastic/elasticsearch into …

29ae428

…parse-new-memory-fields

fix transport version

353824c

Update transpor version constant

9db8c76

valeriy42 merged commit 2e13a9f into elastic:main Aug 21, 2023

valeriy42 deleted the parse-new-memory-fields branch August 21, 2023 13:54

valeriy42 mentioned this pull request Oct 9, 2023

[ML] Use perAllocation and perDeployment memory usage in the model assignment planner #98874

Merged

droberts195 reviewed Oct 9, 2023

View reviewed changes

droberts195 mentioned this pull request Dec 11, 2023

Revert "[ML] Use perAllocation and perDeployment memory usage in the model assignment planner" #103283

Closed

This was referenced Apr 24, 2024

[ML] Regression in ML Trained Model Deployment Update Causes Failure #107807

Closed

[ML] Revert Key Changes from PR #98139 to Address Deployment Update Failures #107824

Open

davidkyle mentioned this pull request Apr 24, 2024

[ML] Trained model size calculated incorrectly #107831

Closed

valeriy42 mentioned this pull request Feb 25, 2026

[ML] Fix: required_native_memory_bytes Calculated with Wrong Allocation Count #143077

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Add per allocation and per deployment memory metadata fields to the trained models config#98139

[ML] Add per allocation and per deployment memory metadata fields to the trained models config#98139
valeriy42 merged 37 commits intoelastic:mainfrom
valeriy42:parse-new-memory-fields

valeriy42 commented Aug 2, 2023 •

edited

Loading

Uh oh!

valeriy42 commented Aug 16, 2023

Uh oh!

davidkyle left a comment

Uh oh!

valeriy42 commented Aug 21, 2023

Uh oh!

valeriy42 commented Aug 21, 2023

Uh oh!

droberts195 Oct 9, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

valeriy42 commented Aug 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

valeriy42 commented Aug 16, 2023

Uh oh!

davidkyle left a comment

Choose a reason for hiding this comment

Uh oh!

valeriy42 commented Aug 21, 2023

Uh oh!

valeriy42 commented Aug 21, 2023

Uh oh!

droberts195 Oct 9, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

valeriy42 commented Aug 2, 2023 •

edited

Loading