[ML] Add per allocation and per deployment memory metadata fields to the trained models config#98139
Conversation
…parse-new-memory-fields
…parse-new-memory-fields
…parse-new-memory-fields
|
Thank you for the review @davidkyle . I updated the code as you suggested. It would be great if you could take another pass. |
…parse-new-memory-fields
…parse-new-memory-fields
|
@elasticmachine, run elasticsearch-ci/part-1 |
|
@elasticmachine, run elasticsearch-ci/part-2 |
|
|
||
| public long getPerDeploymentMemoryBytes() { | ||
| return metadata != null && metadata.containsKey(PER_DEPLOYMENT_MEMORY_BYTES.getPreferredName()) | ||
| ? ((Number) metadata.get(PER_DEPLOYMENT_MEMORY_BYTES.getPreferredName())).longValue() |
There was a problem hiding this comment.
Is it possible to create a model that makes this throw a ClassCastException by setting this field to a string in the metadata?
Also, there's no protection against negative numbers here.
I know this PR was merged ages ago, but I think it's worth adding the extra protection in a followup, as invalid configs living inside the cluster can cause big problems later on.
Since the code has been shipped without protection these reader methods will just have to return 0 in place of strings, lists, maps or negative numbers. The Elastic "no surprises" philosophy would imply we should have separate methods that get used when a new config is put that validate the fields and throw exceptions if the fields exist and are not non-negative numbers. But that validation can only be on initial put of the config.
…signment planner (#98874) Building upon #98139, this PR extends the model assignment planning algorithms and the linear solver to use the extended memory fields. It also adds unit tests to verify the new behavior. I needed to adjust the old unit tests since we use the estimateMemoryUsage routine, which would compute 2*memoryBytes + 240 MB as the memory requirement. Previously, in the unit tests, we were simply using memoryBytes field value.
…signment planner (elastic#98874) Building upon elastic#98139, this PR extends the model assignment planning algorithms and the linear solver to use the extended memory fields. It also adds unit tests to verify the new behavior. I needed to adjust the old unit tests since we use the estimateMemoryUsage routine, which would compute 2*memoryBytes + 240 MB as the memory requirement. Previously, in the unit tests, we were simply using memoryBytes field value.
…dates This commit reverts changes to the memory usage estimation logic introduced by PR elastic#98139, which caused failures when updating the `number_of_allocations` for trained model deployments. The reversion restores the system's stability in high-availability environments. Relates elastic#107807
…on Count (#143077) The trained model stats API was computing `required_native_memory_bytes` incorrectly. Since #98139, memory estimation has depended on the number of allocations, but the code used the total allocation count across all deployments instead of each deployment’s own count. As a result, when multiple NLP models were deployed with different allocation counts, every model’s `required_native_memory_bytes` was based on the same summed value, so changing allocations for one model incorrectly changed the reported memory for others. This only affected the Stats API output, not actual deployment behavior. The fix computes `required_native_memory_bytes` per deployment using each deployment’s allocation count. `TransportGetTrainedModelsStatsAction` now passes `Map<String, AssignmentStats>` into `modelSizeStats()` instead of a summed allocation count, and `modelSizeStats()` emits per-deployment entries keyed by `deploymentId` with the correct `numberOfAllocations`. `GetTrainedModelsStatsAction.Response.Builder.build()` looks up model size stats by `deploymentId` first and falls back to `modelId` for undeployed or non-PyTorch models. Unit tests were added to cover per-deployment resolution, undeployed models, and the fallback path. Fixes #107831
…on Count (elastic#143077) The trained model stats API was computing `required_native_memory_bytes` incorrectly. Since elastic#98139, memory estimation has depended on the number of allocations, but the code used the total allocation count across all deployments instead of each deployment’s own count. As a result, when multiple NLP models were deployed with different allocation counts, every model’s `required_native_memory_bytes` was based on the same summed value, so changing allocations for one model incorrectly changed the reported memory for others. This only affected the Stats API output, not actual deployment behavior. The fix computes `required_native_memory_bytes` per deployment using each deployment’s allocation count. `TransportGetTrainedModelsStatsAction` now passes `Map<String, AssignmentStats>` into `modelSizeStats()` instead of a summed allocation count, and `modelSizeStats()` emits per-deployment entries keyed by `deploymentId` with the correct `numberOfAllocations`. `GetTrainedModelsStatsAction.Response.Builder.build()` looks up model size stats by `deploymentId` first and falls back to `modelId` for undeployed or non-PyTorch models. Unit tests were added to cover per-deployment resolution, undeployed models, and the fallback path. Fixes elastic#107831
…on Count (elastic#143077) The trained model stats API was computing `required_native_memory_bytes` incorrectly. Since elastic#98139, memory estimation has depended on the number of allocations, but the code used the total allocation count across all deployments instead of each deployment’s own count. As a result, when multiple NLP models were deployed with different allocation counts, every model’s `required_native_memory_bytes` was based on the same summed value, so changing allocations for one model incorrectly changed the reported memory for others. This only affected the Stats API output, not actual deployment behavior. The fix computes `required_native_memory_bytes` per deployment using each deployment’s allocation count. `TransportGetTrainedModelsStatsAction` now passes `Map<String, AssignmentStats>` into `modelSizeStats()` instead of a summed allocation count, and `modelSizeStats()` emits per-deployment entries keyed by `deploymentId` with the correct `numberOfAllocations`. `GetTrainedModelsStatsAction.Response.Builder.build()` looks up model size stats by `deploymentId` first and falls back to `modelId` for undeployed or non-PyTorch models. Unit tests were added to cover per-deployment resolution, undeployed models, and the fallback path. Fixes elastic#107831
…on Count (#143077) (#143295) The trained model stats API was computing `required_native_memory_bytes` incorrectly. Since #98139, memory estimation has depended on the number of allocations, but the code used the total allocation count across all deployments instead of each deployment’s own count. As a result, when multiple NLP models were deployed with different allocation counts, every model’s `required_native_memory_bytes` was based on the same summed value, so changing allocations for one model incorrectly changed the reported memory for others. This only affected the Stats API output, not actual deployment behavior. The fix computes `required_native_memory_bytes` per deployment using each deployment’s allocation count. `TransportGetTrainedModelsStatsAction` now passes `Map<String, AssignmentStats>` into `modelSizeStats()` instead of a summed allocation count, and `modelSizeStats()` emits per-deployment entries keyed by `deploymentId` with the correct `numberOfAllocations`. `GetTrainedModelsStatsAction.Response.Builder.build()` looks up model size stats by `deploymentId` first and falls back to `modelId` for undeployed or non-PyTorch models. Unit tests were added to cover per-deployment resolution, undeployed models, and the fallback path. Fixes #107831
…on Count (#143077) (#143294) The trained model stats API was computing `required_native_memory_bytes` incorrectly. Since #98139, memory estimation has depended on the number of allocations, but the code used the total allocation count across all deployments instead of each deployment’s own count. As a result, when multiple NLP models were deployed with different allocation counts, every model’s `required_native_memory_bytes` was based on the same summed value, so changing allocations for one model incorrectly changed the reported memory for others. This only affected the Stats API output, not actual deployment behavior. The fix computes `required_native_memory_bytes` per deployment using each deployment’s allocation count. `TransportGetTrainedModelsStatsAction` now passes `Map<String, AssignmentStats>` into `modelSizeStats()` instead of a summed allocation count, and `modelSizeStats()` emits per-deployment entries keyed by `deploymentId` with the correct `numberOfAllocations`. `GetTrainedModelsStatsAction.Response.Builder.build()` looks up model size stats by `deploymentId` first and falls back to `modelId` for undeployed or non-PyTorch models. Unit tests were added to cover per-deployment resolution, undeployed models, and the fallback path. Fixes #107831
…on Count (elastic#143077) The trained model stats API was computing `required_native_memory_bytes` incorrectly. Since elastic#98139, memory estimation has depended on the number of allocations, but the code used the total allocation count across all deployments instead of each deployment’s own count. As a result, when multiple NLP models were deployed with different allocation counts, every model’s `required_native_memory_bytes` was based on the same summed value, so changing allocations for one model incorrectly changed the reported memory for others. This only affected the Stats API output, not actual deployment behavior. The fix computes `required_native_memory_bytes` per deployment using each deployment’s allocation count. `TransportGetTrainedModelsStatsAction` now passes `Map<String, AssignmentStats>` into `modelSizeStats()` instead of a summed allocation count, and `modelSizeStats()` emits per-deployment entries keyed by `deploymentId` with the correct `numberOfAllocations`. `GetTrainedModelsStatsAction.Response.Builder.build()` looks up model size stats by `deploymentId` first and falls back to `modelId` for undeployed or non-PyTorch models. Unit tests were added to cover per-deployment resolution, undeployed models, and the fallback path. Fixes elastic#107831
To improve the required memory estimation of NLP models, this PR introduces two new metadata fields:
per_deployment_memory_bytesandper_allocation_memory_bytes.per_deployment_memory_bytesis the memory required to load the model in the deploymentper_allocation_memory_bytesis the temporary additional memory used during the inference for every allocation.This PR extends the memory usage estimation logic while ensuring backward compatibility.
In a follow-up PR, I will adjust the assignment planner to use the refined memory usage information.