Description
The current behavior of the Inference Process is that it will update an existing output field with new fields. For example, if the processor definition is:
{
"inference": {
"model_id": "model_deployment_for_inference",
"input_output": [
{
"input_field": "content",
"output_field": "content_embedding"
}
]
}
}
And the index already includes the field content_embedding, the inference results will be added along side any existing subfields within content_embedding. This can cause duplicate field failures.
For the docs here: https://www.elastic.co/guide/en/elasticsearch/reference/current/inference-processor.html
Maybe adding a note about the usage of input_output that the output field is not overwritten. If it exists, any fields it contains will remain when attempting to write the new results which could result in duplicate fields and a failure.
I think in most cases the output field should be removed prior to performing inference again.
Description
The current behavior of the Inference Process is that it will update an existing output field with new fields. For example, if the processor definition is:
And the index already includes the field
content_embedding, the inference results will be added along side any existing subfields withincontent_embedding. This can cause duplicate field failures.For the docs here: https://www.elastic.co/guide/en/elasticsearch/reference/current/inference-processor.html
Maybe adding a note about the usage of
input_outputthat the output field is not overwritten. If it exists, any fields it contains will remain when attempting to write the new results which could result in duplicate fields and a failure.I think in most cases the output field should be removed prior to performing inference again.