Skip to content

[ML] Starting trained models with a ridiculously large queue_capacity crashes the node #89555

@dolaru

Description

@dolaru

Elasticsearch Version

Found in 8.4.0

Installed Plugins

No response

Java Version

bundled

OS Version

Darwin Kernel Version 21.6.0

Problem Description

Although unlikely, if a user tries to start a trained model deployment with a ridiculously large queue_capacity (for example 9999999999999, the node that tries to allocate the model will immediately crash.

Furthermore, the node will continue to crash when restarted, as soon as the node attempts to allocate the trained model deployment.

Steps to Reproduce

  1. Import a trained model
  2. Start the trained model deployment with queue_capacity=9999999999999
  3. Notice that the ML node that attempted to start the trained model deployment has crashed
  4. Try restarting the node
  5. Notice that the node continues the crash as soon as it tries to allocate the trained model deployment

Logs (if relevant)

The Elasticsearch instance crashes immediately after logging the attempt to start the trained model deployment. No logs after that.

Metadata

Metadata

Assignees

No one assigned

    Labels

    :mlMachine learning>bugTeam:MLMeta label for the ML team

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions