Skip to content

Integrate OpenSearch Ml-Commons into Data Prepper #5509

@Zhangxunmt

Description

@Zhangxunmt

Is your feature request related to a problem? Please describe.
ML Commons is an OpenSearch plugin that manages Machine Learning models to enhance search relevance through semantic understanding. You can deploy models directly within your OpenSearch cluster or connect to externally hosted models.

For neural search, a language model converts text into vector embeddings. During ingestion, OpenSearch generates vector embeddings for text fields in incoming requests. At search time, the same model transforms query text into vector embeddings, enabling vector similarity search. It is crucial to use the same ML model for both ingestion and search to ensure consistency.

To support offline batch ingestion, Data Prepper is proposed as the ingestion engine for transforming text into vector embeddings. This processor will also support streaming mode data transformation.

Describe the solution you'd like
Build a new processor that integrates the ml-commons ML model Predict/batch_predict APIs into the Data Prepper pipelines.

Describe alternatives you've considered (Optional)
The model management and predict/batch_predict API has already been launched in ml-commons. This feature only integrate them into the Data Prepper.

Additional context
#5433

Metadata

Metadata

Assignees

Labels

plugin - processorA plugin to manipulate data in the data prepper pipeline.

Type

No type

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions