Skip to content

[ML] Validate trained model deployment queue_capacity limit#89573

Merged
dimitris-athanasiou merged 2 commits intoelastic:mainfrom
dimitris-athanasiou:limit-model-deployment-queue-capacity
Aug 24, 2022
Merged

[ML] Validate trained model deployment queue_capacity limit#89573
dimitris-athanasiou merged 2 commits intoelastic:mainfrom
dimitris-athanasiou:limit-model-deployment-queue-capacity

Conversation

@dimitris-athanasiou
Copy link
Copy Markdown
Contributor

When starting a trained model deployment, a queue is created.
If the queue_capacity is too large, it can lead to OOM and a node
crash.

This commit adds validation that the queue_capacity cannot be more
than 1M.

Closes #89555

When starting a trained model deployment, a queue is created.
If the queue_capacity is too large, it can lead to OOM and a node
crash.

This commit adds validation that the queue_capacity cannot be more
than 1M.

Closes elastic#89555
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @dimitris-athanasiou, I've created a changelog YAML for you.

Copy link
Copy Markdown
Contributor

@edsavage edsavage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dimitris-athanasiou dimitris-athanasiou merged commit 32d5122 into elastic:main Aug 24, 2022
@dimitris-athanasiou dimitris-athanasiou deleted the limit-model-deployment-queue-capacity branch August 24, 2022 13:52
@dimitris-athanasiou dimitris-athanasiou restored the limit-model-deployment-queue-capacity branch August 24, 2022 13:52
@dimitris-athanasiou dimitris-athanasiou deleted the limit-model-deployment-queue-capacity branch August 25, 2022 08:50
dimitris-athanasiou added a commit to dimitris-athanasiou/elasticsearch that referenced this pull request Aug 25, 2022
As it has been backported to 8.4.1 with elastic#89611
dimitris-athanasiou added a commit that referenced this pull request Aug 25, 2022
As it has been backported to 8.4.1 with #89611
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :ml Machine learning Team:ML Meta label for the ML team v8.4.1 v8.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ML] Starting trained models with a ridiculously large queue_capacity crashes the node

3 participants