Update Pipeline Config API by san81 · Pull Request #5767 · opensearch-project/data-prepper

san81 · 2025-06-09T00:38:25Z

Description

Introducing two new DataPrepper core api to dynamically update pipeline configuration. API itself is not updating the pipeline yet. For now, it is only adding additional path mapping and enabling a way to add further logic. Breaking the whole functionality into multiple small PRs so sharing this one without fully integrated.

isDynamicallyUpdatablePipelineConfig API - to validate if a given new pipeline config yaml is updatable compared to its current state.
updatePipelineConfig API - to update the pipeline state by swapping the processor instances.

Sample API looks like below. The pipeline to get updated is testpipeline the name passed in the path parameter. The expectation is that, it could be part of any one of the S3 paths passed and the goal is that this API only updates this specific pipeline (if feasible).

curl --location --request PUT 'localhost:4900/updatePipelineConfig/testpipeline' \
--header 'Content-Type: application/json' \
--data '{
    "s3paths": ["s3://pipline-configuration/pipeline-1.yaml",
                "s3://pipline-configuration/pipeline-2.yaml"
        ],
     "s3region": "us-east-1"
}'

Issues Resolved

#5716

Check List

New functionality includes testing.
New functionality has a documentation issue. Please link to it in this PR.
- New functionality has javadoc added
Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

…ne config or not Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

sb2k16 · 2025-06-11T17:57:16Z

...src/main/java/org/opensearch/dataprepper/core/pipeline/server/UpdatePipelineBaseHandler.java

+
+    public UpdatePipelineBaseHandler(final PipelinesProvider pipelinesProvider) {
+        this.pipelinesProvider = pipelinesProvider;
+        this.s3Client = S3Client.builder().region(Region.US_EAST_1).build();


can we avoid hardcoding the region.

Removed that and now expecting that will also be provided by the user in the payload.

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

…t to check for dynamic update feasibility Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

dlvenable

This is a nice feature! One thing that is probably missing though - We cannot start with an S3 object. We only support refresh from S3.

There should be some support in the data-prepper-config.yaml to load from S3 and not use the pipelines/ path.

dlvenable · 2025-06-19T16:40:47Z

...src/main/java/org/opensearch/dataprepper/core/pipeline/server/DynamicPipelineUpdateUtil.java

+import java.util.Set;
+import java.util.stream.Collectors;
+
+public class DynamicPipelineUpdateUtil {


Let's not use Util classes. We support Spring dependency injection. So use that instead to avoid the long-term difficulties of relying on utility classes.

Converted to Spring Service (named bean)

dlvenable · 2025-06-19T16:43:49Z

data-prepper-api/src/main/java/org/opensearch/dataprepper/model/configuration/PluginModel.java

    private final String pluginName;
    private final InternalJsonModel innerModel;

+    public PluginModel(final String pluginName, final Map<String, Object> pluginSettings) {


Were these moved within nested classes? Or just shifted in the code?

Just shifted in the code. I now adjusted my IDE formatter not to touch unmodified code.

dlvenable · 2025-06-19T16:44:19Z

data-prepper-api/src/main/java/org/opensearch/dataprepper/model/configuration/PluginModel.java

    }

+    @Override
+    public boolean equals(Object o) {


Please add unit tests for these new methods.

dlvenable · 2025-06-19T16:46:31Z

...src/main/java/org/opensearch/dataprepper/core/pipeline/server/DynamicPipelineUpdateUtil.java

+            targetProcessors = targetProcessors == null ? List.of() : targetProcessors;
+
+            // Collect single-threaded processors in current and target
+            Set<String> currentSingleThreaded = currentProcessors.stream()


There is a bit of split logic here. We have other places to look for this annotation. Can we make use of that existing code?

dlvenable · 2025-06-19T16:48:12Z

...src/main/java/org/opensearch/dataprepper/core/pipeline/server/DynamicPipelineUpdateUtil.java

+        }
+    }
+
+    public static Set<String> scanForSingleThreadAnnotatedProcessorPlugins() {


This is duplicating work from the PluginFactory. We should avoid this logic split.

I don't think the PluginFactory should give all @SingleThreaded annotations. But, maybe have a loadPlugin method with a Predicate?

dlvenable · 2025-06-19T16:49:47Z

...src/main/java/org/opensearch/dataprepper/core/pipeline/server/DynamicPipelineUpdateUtil.java

+            for (String targetProcessor : targetSingleThreaded) {
+                if (!currentSingleThreaded.contains(targetProcessor)) {
+                    throw new DynamicPipelineConfigUpdateException(
+                            "Cannot add new single-threaded processor: " + targetProcessor);


Why do we care about this? Is it because of the state?

If so, this is insufficient. The aggregate processor retains state, but it is done in a thread-safe way.

Also, grok is currently labeled as @SingleThread because of a small code gap.

Maybe we should have an @Stateful annotation to indicate that state is kept?

dlvenable · 2025-06-19T16:50:32Z

...src/main/java/org/opensearch/dataprepper/core/pipeline/server/UpdatePipelineBaseHandler.java

+
+    private static final Logger LOG = LoggerFactory.getLogger(UpdatePipelineBaseHandler.class);
+    private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();
+    private static final Pattern PIPELINE_NAME_PATTERN = Pattern.compile("/([a-zA-Z0-9-]{1,28})$");


This regex appears overly restrictive. I don't think we have a current pipeline name requirement actually.

dlvenable · 2025-06-19T16:53:29Z

...src/main/java/org/opensearch/dataprepper/core/pipeline/server/UpdatePipelineBaseHandler.java

+        } catch (final IllegalArgumentException e) {
+            LOG.warn("Invalid request parameters: {}", e.getMessage());
+            sendErrorResponse(exchange, HttpURLConnection.HTTP_BAD_REQUEST, e.getMessage());
+        } catch (final SdkClientException e) {


We should not add any AWS exceptions here. Let's consolidate these exceptions into the PipelineConfigurationFileReader implementations. That will allow this code to be more flexible for other data sources.

dlvenable · 2025-06-19T16:54:26Z

...src/main/java/org/opensearch/dataprepper/core/pipeline/server/UpdatePipelineBaseHandler.java

+            final S3PathRequest s3PathRequest = parseS3PathRequest(requestBody);
+            PipelinesDataFlowModel targetPipelinesDataFlowModel =
+                    new PipelinesDataflowModelParser(
+                            new PipelineConfigurationS3FileReader(s3PathRequest.s3paths, s3PathRequest.s3region)


We should support reloading from the local file system as well. This will help with local testing and the additional effort should be light. It just requires a little code clean-up to determine the file path. e.g. s3paths versus filepaths.

san81 added 5 commits June 6, 2025 17:27

update pipeline handler

c288efb

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

better version of tests

33b44e2

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

fixing the path and corresponding tests

e83e151

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

hyphen is acceptable and max length is 28 characters

79d3a14

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

Test cases fix

164a40f

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

san81 marked this pull request as ready for review June 9, 2025 21:54

san81 requested review from KarstenSchnitter, chenqi0805, dinujoh, dlvenable, engechas, graytaylor0, kkondaka, oeyh, sb2k16 and srikanthjg as code owners June 9, 2025 21:54

san81 added 3 commits June 9, 2025 15:17

added additional API for verification of dynamically updatable pipeli…

ed2ca71

…ne config or not Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

Merge branch 'opensearch-project:main' into update-config-api

4df28c8

setting up a default region for now (which needs further discussion)

416f5ad

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

sb2k16 reviewed Jun 11, 2025

View reviewed changes

san81 added 5 commits June 11, 2025 12:23

expecting region in the payload for the S3 path mentioned

7c565a6

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

Added new and old pipeline model comparison and its corresponding tes…

7a62a7b

…t to check for dynamic update feasibility Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

Merge branch 'opensearch-project:main' into update-config-api

e91b222

multi pipeline scenario added

e237335

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

also looking for peer forward type to exclude from dynamic updates

4b0ff14

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

dlvenable requested changes Jun 19, 2025

View reviewed changes

dlvenable mentioned this pull request Jun 20, 2025

Adding Processor Registry to provision Atomic swapping of Processor instances #5794

Merged

4 tasks

san81 added 2 commits June 24, 2025 21:21

Merge branch 'opensearch-project:main' into update-config-api

1c9f289

Merge branch 'opensearch-project:main' into update-config-api

8567ebc

Conversation

san81 commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issues Resolved

Check List

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dlvenable left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

san81 commented Jun 9, 2025 •

edited

Loading