Skip to content

[Feature Request] Support system generated ingest pipelines for bulk update operations #18276

@q-andy

Description

@q-andy

Is your feature request related to a problem? Please describe

Currently there is inconsistency around how ingest pipelines are applied to single/bulk document update operations described in #17742. This leads to inconsistent document processing, particularly when update requests generate multiple index operations (e.g., upsert scenarios or doc_as_upsert cases): certain flag combinations trigger ingest pipelines, while others don't.

System ingest pipelines introduced in #17817 are intended to apply processor transformations like embedding generation for semantic field while abstracting away pipeline setup for users. In addition to the update inconsistency problems described previously, this introduces more surface area for confusion: for example, semantic field users may bulk update their semantic text field without knowing it uses system ingest pipelines to generate embeddings under the hood. This would cause the text field and the underlying embedding to be out of sync due to pipelines not being triggered, leading to search degradation.

We propose a sub-solution for the general case described in #17742 where we resolve and execute system pipelines for all update requests to make this behavior consistent. Much of this work is also shared with resolving the general case of the original issue.

Note that with this change, with bulk update operations, system ingest processors will be triggered on partial docs which may not contain all fields expected for documents (fields defined in in the index mapping). System ingest processors MUST handle this case gracefully (validate fields exist before accessing, have clearly defined behavior when fields are missing) or else bulk update operations will fail.

Describe the solution you'd like

Support system ingest pipelines for bulk update operations

Update Request Type Classification

  • Introduce a method to expose all child index requests associated with an update operation

Pipeline Resolution Enhancement

  • Use resolveSystemIngestPipeline to enable resolving only the system ingest pipeline while setting the others to NOOP
  • Based on update request fields, we extract the update request children and conditionally resolve ALL pipelines, resolve ONLY system ingest pipelines, or no pipelines at all.

Slot Management

  • Introduce innerSlot to track individual child index requests within anupdate operation
  • Use innerslot to map pipeline execution results back to the correct child request using (slot, innerSlot) pairs
  • Maintain proper error handling and response mapping for both parent and child operations to their original bulk request slot

Related component

Indexing

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

IndexingIndexing, Bulk Indexing and anything related to indexingenhancementEnhancement or improvement to existing feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions