Skip to content

[ML] Inference Process update the docs to state output field behavior #107230

@jonathan-buttner

Description

@jonathan-buttner

Description

The current behavior of the Inference Process is that it will update an existing output field with new fields. For example, if the processor definition is:

{
  "inference": {
    "model_id": "model_deployment_for_inference",
    "input_output": [
        {
            "input_field": "content",
            "output_field": "content_embedding"
        }
    ]
  }
}

And the index already includes the field content_embedding, the inference results will be added along side any existing subfields within content_embedding. This can cause duplicate field failures.

For the docs here: https://www.elastic.co/guide/en/elasticsearch/reference/current/inference-processor.html

Maybe adding a note about the usage of input_output that the output field is not overwritten. If it exists, any fields it contains will remain when attempting to write the new results which could result in duplicate fields and a failure.

I think in most cases the output field should be removed prior to performing inference again.

Metadata

Metadata

Assignees

Labels

:mlMachine learning>docsGeneral docs changesFeature:GenAIFeatures around GenAITeam:DocsMeta label for docs teamTeam:MLMeta label for the ML team

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions