Skip to content

[ML] Add per allocation and per deployment memory metadata fields to the trained models config#98139

Merged
valeriy42 merged 37 commits intoelastic:mainfrom
valeriy42:parse-new-memory-fields
Aug 21, 2023
Merged

[ML] Add per allocation and per deployment memory metadata fields to the trained models config#98139
valeriy42 merged 37 commits intoelastic:mainfrom
valeriy42:parse-new-memory-fields

Conversation

@valeriy42
Copy link
Copy Markdown
Contributor

@valeriy42 valeriy42 commented Aug 2, 2023

To improve the required memory estimation of NLP models, this PR introduces two new metadata fields: per_deployment_memory_bytes and per_allocation_memory_bytes.

per_deployment_memory_bytes is the memory required to load the model in the deployment
per_allocation_memory_bytes is the temporary additional memory used during the inference for every allocation.

This PR extends the memory usage estimation logic while ensuring backward compatibility.

In a follow-up PR, I will adjust the assignment planner to use the refined memory usage information.

@valeriy42 valeriy42 added WIP :ml Machine learning v8.10.0 labels Aug 2, 2023
@valeriy42 valeriy42 marked this pull request as draft August 2, 2023 14:05
@valeriy42 valeriy42 changed the title [ML] Add memory size field to the trained models config [ML] Add per allocation and per deployment memory metadata fields to the trained models config Aug 15, 2023
@valeriy42 valeriy42 marked this pull request as ready for review August 15, 2023 10:44
@davidkyle davidkyle self-requested a review August 15, 2023 11:02
@valeriy42
Copy link
Copy Markdown
Contributor Author

Thank you for the review @davidkyle . I updated the code as you suggested. It would be great if you could take another pass.

Copy link
Copy Markdown
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@valeriy42
Copy link
Copy Markdown
Contributor Author

@elasticmachine, run elasticsearch-ci/part-1

@valeriy42
Copy link
Copy Markdown
Contributor Author

@elasticmachine, run elasticsearch-ci/part-2


public long getPerDeploymentMemoryBytes() {
return metadata != null && metadata.containsKey(PER_DEPLOYMENT_MEMORY_BYTES.getPreferredName())
? ((Number) metadata.get(PER_DEPLOYMENT_MEMORY_BYTES.getPreferredName())).longValue()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to create a model that makes this throw a ClassCastException by setting this field to a string in the metadata?

Also, there's no protection against negative numbers here.

I know this PR was merged ages ago, but I think it's worth adding the extra protection in a followup, as invalid configs living inside the cluster can cause big problems later on.

Since the code has been shipped without protection these reader methods will just have to return 0 in place of strings, lists, maps or negative numbers. The Elastic "no surprises" philosophy would imply we should have separate methods that get used when a new config is put that validate the fields and throw exceptions if the fields exist and are not non-negative numbers. But that validation can only be on initial put of the config.

valeriy42 added a commit that referenced this pull request Nov 6, 2023
…signment planner (#98874)

Building upon #98139, this PR extends the model assignment planning algorithms and the linear solver to use the extended memory fields. It also adds unit tests to verify the new behavior.

I needed to adjust the old unit tests since we use the estimateMemoryUsage routine, which would compute 2*memoryBytes + 240 MB as the memory requirement. Previously, in the unit tests, we were simply using memoryBytes field value.
valeriy42 added a commit to valeriy42/elasticsearch that referenced this pull request Nov 7, 2023
…signment planner (elastic#98874)

Building upon elastic#98139, this PR extends the model assignment planning algorithms and the linear solver to use the extended memory fields. It also adds unit tests to verify the new behavior.

I needed to adjust the old unit tests since we use the estimateMemoryUsage routine, which would compute 2*memoryBytes + 240 MB as the memory requirement. Previously, in the unit tests, we were simply using memoryBytes field value.
Rassyan added a commit to Rassyan/elasticsearch that referenced this pull request Apr 24, 2024
…dates

This commit reverts changes to the memory usage estimation logic introduced by PR elastic#98139, which caused failures when updating the `number_of_allocations` for trained model deployments. The reversion restores the system's stability in high-availability environments.

Relates elastic#107807
valeriy42 added a commit that referenced this pull request Feb 27, 2026
…on Count (#143077)

The trained model stats API was computing `required_native_memory_bytes` incorrectly. Since #98139, memory estimation has depended on the number of allocations, but the code used the total allocation count across all deployments instead of each deployment’s own count. As a result, when multiple NLP models were deployed with different allocation counts, every model’s `required_native_memory_bytes` was based on the same summed value, so changing allocations for one model incorrectly changed the reported memory for others. This only affected the Stats API output, not actual deployment behavior.

The fix computes `required_native_memory_bytes` per deployment using each deployment’s allocation count. `TransportGetTrainedModelsStatsAction` now passes `Map<String, AssignmentStats>` into `modelSizeStats()` instead of a summed allocation count, and `modelSizeStats()` emits per-deployment entries keyed by `deploymentId` with the correct `numberOfAllocations`. `GetTrainedModelsStatsAction.Response.Builder.build()` looks up model size stats by `deploymentId` first and falls back to `modelId` for undeployed or non-PyTorch models. Unit tests were added to cover per-deployment resolution, undeployed models, and the fallback path.


Fixes #107831
valeriy42 added a commit to valeriy42/elasticsearch that referenced this pull request Feb 27, 2026
…on Count (elastic#143077)

The trained model stats API was computing `required_native_memory_bytes` incorrectly. Since elastic#98139, memory estimation has depended on the number of allocations, but the code used the total allocation count across all deployments instead of each deployment’s own count. As a result, when multiple NLP models were deployed with different allocation counts, every model’s `required_native_memory_bytes` was based on the same summed value, so changing allocations for one model incorrectly changed the reported memory for others. This only affected the Stats API output, not actual deployment behavior.

The fix computes `required_native_memory_bytes` per deployment using each deployment’s allocation count. `TransportGetTrainedModelsStatsAction` now passes `Map<String, AssignmentStats>` into `modelSizeStats()` instead of a summed allocation count, and `modelSizeStats()` emits per-deployment entries keyed by `deploymentId` with the correct `numberOfAllocations`. `GetTrainedModelsStatsAction.Response.Builder.build()` looks up model size stats by `deploymentId` first and falls back to `modelId` for undeployed or non-PyTorch models. Unit tests were added to cover per-deployment resolution, undeployed models, and the fallback path.


Fixes elastic#107831
valeriy42 added a commit to valeriy42/elasticsearch that referenced this pull request Feb 27, 2026
…on Count (elastic#143077)

The trained model stats API was computing `required_native_memory_bytes` incorrectly. Since elastic#98139, memory estimation has depended on the number of allocations, but the code used the total allocation count across all deployments instead of each deployment’s own count. As a result, when multiple NLP models were deployed with different allocation counts, every model’s `required_native_memory_bytes` was based on the same summed value, so changing allocations for one model incorrectly changed the reported memory for others. This only affected the Stats API output, not actual deployment behavior.

The fix computes `required_native_memory_bytes` per deployment using each deployment’s allocation count. `TransportGetTrainedModelsStatsAction` now passes `Map<String, AssignmentStats>` into `modelSizeStats()` instead of a summed allocation count, and `modelSizeStats()` emits per-deployment entries keyed by `deploymentId` with the correct `numberOfAllocations`. `GetTrainedModelsStatsAction.Response.Builder.build()` looks up model size stats by `deploymentId` first and falls back to `modelId` for undeployed or non-PyTorch models. Unit tests were added to cover per-deployment resolution, undeployed models, and the fallback path.


Fixes elastic#107831
elasticsearchmachine pushed a commit that referenced this pull request Feb 27, 2026
…on Count (#143077) (#143295)

The trained model stats API was computing `required_native_memory_bytes` incorrectly. Since #98139, memory estimation has depended on the number of allocations, but the code used the total allocation count across all deployments instead of each deployment’s own count. As a result, when multiple NLP models were deployed with different allocation counts, every model’s `required_native_memory_bytes` was based on the same summed value, so changing allocations for one model incorrectly changed the reported memory for others. This only affected the Stats API output, not actual deployment behavior.

The fix computes `required_native_memory_bytes` per deployment using each deployment’s allocation count. `TransportGetTrainedModelsStatsAction` now passes `Map<String, AssignmentStats>` into `modelSizeStats()` instead of a summed allocation count, and `modelSizeStats()` emits per-deployment entries keyed by `deploymentId` with the correct `numberOfAllocations`. `GetTrainedModelsStatsAction.Response.Builder.build()` looks up model size stats by `deploymentId` first and falls back to `modelId` for undeployed or non-PyTorch models. Unit tests were added to cover per-deployment resolution, undeployed models, and the fallback path.


Fixes #107831
elasticsearchmachine pushed a commit that referenced this pull request Feb 27, 2026
…on Count (#143077) (#143294)

The trained model stats API was computing `required_native_memory_bytes` incorrectly. Since #98139, memory estimation has depended on the number of allocations, but the code used the total allocation count across all deployments instead of each deployment’s own count. As a result, when multiple NLP models were deployed with different allocation counts, every model’s `required_native_memory_bytes` was based on the same summed value, so changing allocations for one model incorrectly changed the reported memory for others. This only affected the Stats API output, not actual deployment behavior.

The fix computes `required_native_memory_bytes` per deployment using each deployment’s allocation count. `TransportGetTrainedModelsStatsAction` now passes `Map<String, AssignmentStats>` into `modelSizeStats()` instead of a summed allocation count, and `modelSizeStats()` emits per-deployment entries keyed by `deploymentId` with the correct `numberOfAllocations`. `GetTrainedModelsStatsAction.Response.Builder.build()` looks up model size stats by `deploymentId` first and falls back to `modelId` for undeployed or non-PyTorch models. Unit tests were added to cover per-deployment resolution, undeployed models, and the fallback path.


Fixes #107831
tballison pushed a commit to tballison/elasticsearch that referenced this pull request Mar 3, 2026
…on Count (elastic#143077)

The trained model stats API was computing `required_native_memory_bytes` incorrectly. Since elastic#98139, memory estimation has depended on the number of allocations, but the code used the total allocation count across all deployments instead of each deployment’s own count. As a result, when multiple NLP models were deployed with different allocation counts, every model’s `required_native_memory_bytes` was based on the same summed value, so changing allocations for one model incorrectly changed the reported memory for others. This only affected the Stats API output, not actual deployment behavior.

The fix computes `required_native_memory_bytes` per deployment using each deployment’s allocation count. `TransportGetTrainedModelsStatsAction` now passes `Map<String, AssignmentStats>` into `modelSizeStats()` instead of a summed allocation count, and `modelSizeStats()` emits per-deployment entries keyed by `deploymentId` with the correct `numberOfAllocations`. `GetTrainedModelsStatsAction.Response.Builder.build()` looks up model size stats by `deploymentId` first and falls back to `modelId` for undeployed or non-PyTorch models. Unit tests were added to cover per-deployment resolution, undeployed models, and the fallback path.


Fixes elastic#107831
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants