Skip to content

[ML] Validate existing cluster state differently to newly submitted configs #30084

@elasticmachine

Description

@elasticmachine

Original comment by @droberts195:

If we're going to introduce completely new job types in the future, we need to change the way unknown job/datafeed cluster state is validated.

While trying to add categorizer jobs, which are quite similar to anomaly_detector jobs, I ran into the following problem:

  • Logically, a categorizer job should have no detectors
  • But the AnalysisConfig class requires detectors
  • There are two possible solutions that seem reasonable at first glance:
    1. Have categorizer jobs have a categorization_config instead of analysis_config
    2. Change analysis_config so that detectors is not required if the job_type is categorizer
  • Unfortunately neither of these works:
    1. Old nodes will ignore categorization_config when parsing metadata, but then error because Job requires an analysis_config
    2. Old nodes will not tolerate an analysis_config with no detectors
  • This results in the messy solution that categorizer jobs will have to have an analysis_config that includes unnecessary fields - new nodes will ignore these fields and mask them when printing the config in REST responses, but old nodes will show the unnecessary bits

I think the only long term solution that allows the necessary degree of extensibility is to hold Jobs as arbitrary Map<String, Object> or BytesReference when parsing from cluster state, and only interpret what's in the Map or BytesReference if the job_type is understood. This is pretty much how index settings work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions