Skip to content

Add integ test for simulating node join left event when data node clu…#19907

Merged
shwetathareja merged 3 commits intoopensearch-project:mainfrom
SwethaGuptha:node-join-left-integ-test
Dec 29, 2025
Merged

Add integ test for simulating node join left event when data node clu…#19907
shwetathareja merged 3 commits intoopensearch-project:mainfrom
SwethaGuptha:node-join-left-integ-test

Conversation

@SwethaGuptha
Copy link
Copy Markdown
Contributor

@SwethaGuptha SwethaGuptha commented Nov 6, 2025

…ster state publication lag because the cluster applier thread being busy.

Description

Add test for cluster stability to verify cluster becomes stable after node join-left loop in cluster due to cluster publication lag because of cluster state applier thread occupied by a cluster state listener for a long running task. This simulates the scenario where the cluster applier thread is busy with shard clean up activity leading to node drops because of publication lag.

Setup:

  • Creates 7-node cluster (1 cluster manager + 6 data nodes)

  • Adds slow cluster state listener to subset of data nodes (30s sleep)

  • Continuously moves shards between nodes to trigger cluster state changes

  • Verifies cluster remains stabilizes after the cluster state listener is removed.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

  • Documentation

    • Updated changelog with an entry describing a new integration test covering node join/leave when cluster-state publication lags.
  • Tests

    • Added integration tests for publication-lag scenarios, delayed shard cleanup, and long-running listener conditions to validate cluster stability and recovery.
    • Added a test-only annotation to opt tests into the listener behavior.
    • Introduced test-only listener/plugin classes and helper methods used by the new tests.
    • Enhanced test log capture with new APIs to enqueue messages for verification.

✏️ Tip: You can customize this high-level summary in your review settings.

@SwethaGuptha SwethaGuptha requested a review from a team as a code owner November 6, 2025 09:35
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Nov 6, 2025

❌ Gradle check result for 9d64a60: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@SwethaGuptha SwethaGuptha force-pushed the node-join-left-integ-test branch from 9d64a60 to 8994161 Compare November 26, 2025 06:21
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 8994161: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@SwethaGuptha
Copy link
Copy Markdown
Contributor Author

❌ Gradle check result for 8994161: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Failure due to org.opensearch.arrow.flight.stats.FlightMetricsTests.testComprehensiveMetrics

Existing issue: #18947

@SwethaGuptha SwethaGuptha force-pushed the node-join-left-integ-test branch from 8994161 to 49169d2 Compare November 29, 2025 17:54
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Nov 29, 2025

📝 Walkthrough

Walkthrough

Adds a test-only index-store listener plugin, a gating annotation, two integration tests that simulate cluster-state publication lag (including long-running applier-thread delays), small test-framework log-capture APIs, and an Unreleased changelog entry. No production public API changes.

Changes

Cohort / File(s) Change Summary
Changelog
CHANGELOG.md
Adds an Unreleased entry describing an integration test that simulates node join/leave with data-node cluster-state publication lag.
Integration tests & test plugin
server/src/internalClusterTest/java/org/opensearch/cluster/coordination/NodeJoinLeftIT.java
Adds test-only TestIndexStoreListenerPlugin and TestIndexStoreListener (public static inner classes); conditional plugin registration driven by new @RequiresTestIndexStoreListener; introduces tests testClusterStabilityWhenClusterStatePublicationLagsOnShardCleanup and testClusterStabilityWhenClusterStatePublicationLagsWithLongRunningListenerOnApplierThread; adds nodeSettings plumbing, log-capture setup, helpers to simulate/validate publication lag and node drop; contains duplicated listener declarations.
Test annotation
server/src/internalClusterTest/java/org/opensearch/cluster/coordination/RequiresTestIndexStoreListener.java
Adds runtime-retained, method-level annotation @RequiresTestIndexStoreListener to mark tests requiring the test index store listener plugin.
Test framework log capture
test/framework/src/main/java/org/opensearch/test/TestLogsAppender.java
Adds public API methods addMessagesToCapture(Set<String>) and addMessageToCapture(String); updates imports and inverts the add logic condition that governs which messages are enqueued for capture.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant CM as ClusterManager
    participant Applier as CM_ApplierThread
    participant Store as TestIndexStoreListener
    participant Data as DataNode

    CM->>Data: Publish cluster state (node join/leave)
    CM->>Applier: Schedule apply of cluster state
    Applier->>Store: Invoke IndexStoreListener callback (e.g., shard deletion)
    Store-->>Applier: Introduce delay/block (simulate lag)
    Applier-->>CM: Applier completion delayed
    CM->>Data: Publication may timeout / join may fail (logged)
    Note over CM,Data: After delay removed, cluster recovers and shards migrate
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • saratvemulapalli
  • reta
  • andrross
  • dbwiddis
  • kotwanikunal
  • owaiskazi19
  • shwetathareja
  • sachinpkale
  • cwperks
  • CEHENKLE
  • Bukhtawar

Poem

🐰 I pause the applier, slow the flow,
A rolling shard moves soft and slow,
Logs whisper timeouts, nodes skip a beat,
Then hop back in when delays retreat,
🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding an integration test for simulating node join-left events when cluster state publication lags due to cluster applier thread being busy.
Description check ✅ Passed The description provides a comprehensive overview with setup details and test objectives, though it lacks explicit sections matching the template structure (Description, Related Issues, Check List).
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
test/framework/src/main/java/org/opensearch/test/TestLogsAppender.java (1)

39-44: Consider clarifying the deduplication logic.

The reuse of shouldCaptureMessage for pattern deduplication works but is confusing, as that method was designed to check if logs match patterns (used in append), not to deduplicate patterns. This leads to asymmetric behavior: if messagesToCapture contains "foo", adding "foobar" is blocked, but if it contains "foobar", "foo" can still be added.

Consider either documenting this behavior explicitly or extracting a separate isDuplicatePattern method for clarity.

server/src/internalClusterTest/java/org/opensearch/cluster/coordination/NodeJoinLeftIT.java (1)

128-133: Consider using List.of() for conciseness.

The instance initializer block works but is more verbose than necessary.

Apply this diff to simplify:

-        List<String> messagesToCapture = new ArrayList<String>() {
-            {
-                add("failed to join");
-                add("IllegalStateException");
-            }
-        };
+        List<String> messagesToCapture = new ArrayList<>(List.of(
+            "failed to join",
+            "IllegalStateException"
+        ));
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1ee30dc and 49169d2.

📒 Files selected for processing (3)
  • CHANGELOG.md (1 hunks)
  • server/src/internalClusterTest/java/org/opensearch/cluster/coordination/NodeJoinLeftIT.java (8 hunks)
  • test/framework/src/main/java/org/opensearch/test/TestLogsAppender.java (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: gradle-check
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: detect-breaking-change
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: Analyze (java)
🔇 Additional comments (3)
test/framework/src/main/java/org/opensearch/test/TestLogsAppender.java (1)

35-37: LGTM!

The batch addition method appropriately delegates to the single-message method.

CHANGELOG.md (1)

36-37: LGTM!

The changelog entry accurately describes the new integration test and is properly formatted.

server/src/internalClusterTest/java/org/opensearch/cluster/coordination/NodeJoinLeftIT.java (1)

472-477: LGTM!

The plugin implementation correctly provides the test listener.

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 49169d2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@SwethaGuptha SwethaGuptha force-pushed the node-join-left-integ-test branch from 49169d2 to 3297c7c Compare December 5, 2025 09:30
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/src/internalClusterTest/java/org/opensearch/cluster/coordination/NodeJoinLeftIT.java (1)

161-168: Duplicate setting for CLUSTER_NODE_RECONNECT_INTERVAL_SETTING

nodeSettings currently sets NodeConnectionsService.CLUSTER_NODE_RECONNECT_INTERVAL_SETTING twice with different values:

.put(NodeConnectionsService.CLUSTER_NODE_RECONNECT_INTERVAL_SETTING.getKey(), "10s")
...
.put(NodeConnectionsService.CLUSTER_NODE_RECONNECT_INTERVAL_SETTING.getKey(), "100ms")

The second call overrides the first, which is misleading for readers.

Consider removing the unused "10s" entry or consolidating to a single put with the intended value.

🧹 Nitpick comments (5)
CHANGELOG.md (1)

39-39: Clarify changelog wording for readability

The sentence is a bit awkward; consider rephrasing to something like:

- Add integ test for simulating node join left event when data node cluster state publication lag because the cluster applier thread being busy
+ Add integ test to simulate node join/left events when data node cluster state publication lags because the cluster applier thread is busy
server/src/internalClusterTest/java/org/opensearch/cluster/coordination/NodeJoinLeftIT.java (4)

353-406: Static delay flag handling is correct but contains a redundant reset

The pattern of enabling TestIndexStoreListener.DELAY_SHARD_ASSIGNMENT in a try block and clearing it in finally is good and protects against leakage across tests. The explicit assignment back to false at Line 390 is redundant because the finally block already handles cleanup.

You can safely drop the in-body reset and rely on finally to reduce duplication.


472-495: Minor logging clarity in error handler

In the error callback you currently log:

(source, e) -> logger.error(() -> new ParameterizedMessage("{} unexpected error in listener wait", e), e)

Using e as the placeholder argument makes the formatted message slightly odd and also ignores the source parameter. A more typical pattern would incorporate source:

- }, (source, e) -> logger.error(() -> new ParameterizedMessage("{} unexpected error in listener wait", e), e));
+ }, (source, e) -> logger.error(
+        () -> new ParameterizedMessage("{} unexpected error in listener wait", source),
+        e
+    ));

This way the message clearly indicates which operation/listener failed while still logging the exception.


374-375: Keep delay-related constants and log expectations in sync

You introduced SHARD_DELETE_DELAY_SECONDS = 30 in TestIndexStoreListener, and both tests wait for a "Sleeping for 30 seconds" log with a 30-second timeout. The pieces line up now, but they are partially duplicated:

  • Constant: SHARD_DELETE_DELAY_SECONDS
  • Log content and waitForLog timeout values: 30

To avoid future drift if the delay is tuned, consider:

  • Using SHARD_DELETE_DELAY_SECONDS (or a derived millis value) to drive both the log text and the waitForLog timeout; and/or
  • Centralizing the "Sleeping for {} seconds" message format so tests don't need to hard-code "30".

This keeps the delay behavior and assertions consistent if you later adjust the sleep duration.

Also applies to: 433-434, 608-611, 624-635


503-544: Cluster settings scope is acceptable with Scope.TEST, but worth documenting

setupClusterForPublicationLagTest sets transient cluster settings (follower lag timeout, publish timeout, concurrent recoveries) without explicitly restoring them. Given the class is annotated with @ClusterScope(scope = Scope.TEST, numDataNodes = 0), each test gets its own cluster, so these settings won't leak to other tests.

It may still be helpful to add a short comment noting that the Scope.TEST cluster lifecycle isolates these settings, so future refactors (e.g., changing scope) don't accidentally introduce cross-test coupling.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 49169d2 and 3297c7c.

📒 Files selected for processing (3)
  • CHANGELOG.md (1 hunks)
  • server/src/internalClusterTest/java/org/opensearch/cluster/coordination/NodeJoinLeftIT.java (8 hunks)
  • server/src/internalClusterTest/java/org/opensearch/cluster/coordination/RequiresTestIndexStoreListener.java (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
  • GitHub Check: gradle-check
  • GitHub Check: detect-breaking-change
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: Analyze (java)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: Mend Security Check
🔇 Additional comments (3)
server/src/internalClusterTest/java/org/opensearch/cluster/coordination/RequiresTestIndexStoreListener.java (1)

16-21: Annotation definition looks appropriate

Marker annotation scoped to methods with RUNTIME retention is a good fit for driving conditional plugin loading in tests. No changes needed.

server/src/internalClusterTest/java/org/opensearch/cluster/coordination/NodeJoinLeftIT.java (2)

109-123: Annotation-based plugin gating is a solid improvement

Using @RequiresTestIndexStoreListener and requiresTestIndexStoreListener() to decide whether to include TestIndexStoreListenerPlugin makes the test setup explicit and less fragile than string-based test-name checks. The reflective lookup with a safe fallback is reasonable for this usage.

Also applies to: 125-132


599-641: IndexStoreListener plugin and delay behavior look correct

The test-only TestIndexStoreListenerPlugin and TestIndexStoreListener implementation correctly:

  • Registers via the IndexStorePlugin SPI.
  • Uses a volatile static flag to toggle delay.
  • Applies a bounded, logged sleep around shard-path deletion.
  • Restores interrupt status when interrupted.

This is a clean way to simulate slow shard cleanup on the applier thread; no changes needed.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 5, 2025

❌ Gradle check result for 3297c7c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@SwethaGuptha SwethaGuptha force-pushed the node-join-left-integ-test branch from db09f4f to 8270f3a Compare December 9, 2025 01:59
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
server/src/internalClusterTest/java/org/opensearch/cluster/coordination/NodeJoinLeftIT.java (1)

375-378: Variable shadowing: local nodeSettings shadows instance field.

The local variable nodeSettings at line 376 shadows the instance field declared at line 104. This could lead to maintenance confusion.

         for (String nodeName : internalCluster().getNodeNames()) {
-            Settings nodeSettings = internalCluster().getInstance(Settings.class, nodeName);
-            if ("red".equals(nodeSettings.get("node.attr.color")) && nodeSettings.getAsBoolean("node.data", true)) {
+            Settings instanceSettings = internalCluster().getInstance(Settings.class, nodeName);
+            if ("red".equals(instanceSettings.get("node.attr.color")) && instanceSettings.getAsBoolean("node.data", true)) {
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between db09f4f and 8270f3a.

📒 Files selected for processing (4)
  • CHANGELOG.md (1 hunks)
  • server/src/internalClusterTest/java/org/opensearch/cluster/coordination/NodeJoinLeftIT.java (9 hunks)
  • server/src/internalClusterTest/java/org/opensearch/cluster/coordination/RequiresTestIndexStoreListener.java (1 hunks)
  • test/framework/src/main/java/org/opensearch/test/TestLogsAppender.java (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • server/src/internalClusterTest/java/org/opensearch/cluster/coordination/RequiresTestIndexStoreListener.java
  • CHANGELOG.md
  • test/framework/src/main/java/org/opensearch/test/TestLogsAppender.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (22)
  • GitHub Check: gradle-check
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: Analyze (java)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (21, macos-15)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: detect-breaking-change
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: Mend Security Check
🔇 Additional comments (5)
server/src/internalClusterTest/java/org/opensearch/cluster/coordination/NodeJoinLeftIT.java (5)

107-130: LGTM on annotation-based plugin loading.

The reflection-based approach for detecting @RequiresTestIndexStoreListener is more robust than the previous string-based method name matching. The fallback to false on NoSuchMethodException handles edge cases gracefully.


351-368: LGTM on static flag cleanup.

The try-finally pattern ensures DELAY_SHARD_ASSIGNMENT is reset even if the test fails mid-execution, preventing test pollution.


415-422: Transient settings are isolated by test scope.

The previous review flagged potential setting leakage, but since @ClusterScope(scope = Scope.TEST) creates a fresh cluster per test method, transient settings won't persist across tests. The current approach is acceptable, though explicit reset in a finally block would be more defensive for future refactoring.


564-601: LGTM on TestIndexStoreListener implementation.

The implementation correctly addresses previous review feedback:

  • Uses a named constant SHARD_DELETE_DELAY_SECONDS instead of magic number
  • Properly restores interrupt status with Thread.currentThread().interrupt() (line 593)
  • volatile modifier on DELAY_SHARD_ASSIGNMENT ensures visibility across threads

440-490: Validation logic is comprehensive.

The validation methods properly verify:

  • Node drop occurrence via cluster health and specific log patterns
  • Cluster recovery with correct node count and shard activation
  • Shard migration to red nodes using routing table inspection

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 9, 2025

❌ Gradle check result for 8270f3a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@SwethaGuptha SwethaGuptha force-pushed the node-join-left-integ-test branch from 8270f3a to 1ec8f19 Compare December 9, 2025 04:31
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/src/internalClusterTest/java/org/opensearch/cluster/coordination/NodeJoinLeftIT.java (1)

158-167: Duplicate CLUSTER_NODE_RECONNECT_INTERVAL_SETTING key in nodeSettings.

The builder sets NodeConnectionsService.CLUSTER_NODE_RECONNECT_INTERVAL_SETTING twice:

.put(NodeConnectionsService.CLUSTER_NODE_RECONNECT_INTERVAL_SETTING.getKey(), "10s")
...
.put(NodeConnectionsService.CLUSTER_NODE_RECONNECT_INTERVAL_SETTING.getKey(), "100ms")

The second call silently overwrites the first, so the effective interval is "100ms" and the "10s" value is dead configuration.

Please remove one of them (and keep only the intended value) to avoid confusion:

-            .put(NodeConnectionsService.CLUSTER_NODE_RECONNECT_INTERVAL_SETTING.getKey(), "10s")
             .put(FollowersChecker.FOLLOWER_CHECK_TIMEOUT_SETTING.getKey(), "200ms")
             .put(FollowersChecker.FOLLOWER_CHECK_INTERVAL_SETTING.getKey(), "100ms")
             .put(FollowersChecker.FOLLOWER_CHECK_RETRY_COUNT_SETTING.getKey(), 1)
             .put(NodeConnectionsService.CLUSTER_NODE_RECONNECT_INTERVAL_SETTING.getKey(), "100ms")

(or vice versa, if "10s" was intended).

🧹 Nitpick comments (4)
test/framework/src/main/java/org/opensearch/test/TestLogsAppender.java (1)

18-44: Capture API semantics are correct but a bit subtle.

addMessageToCapture only appends when shouldCaptureMessage(message) is false, effectively treating messagesToCapture as a set of substring patterns and avoiding duplicates. That works for the current usage (adding distinct patterns like "Sleeping for 30 seconds" and "reason: lagging"), but it’s not immediately obvious from the code.

If you want to make this clearer and a bit more robust:

  • Consider explicitly checking for duplicates via messagesToCapture.contains(message) instead of reusing shouldCaptureMessage, which is conceptually “does this log contain any capture pattern?” rather than “is this pattern already present?”.
  • Alternatively, change messagesToCapture to a Set<String> (if ordering doesn’t matter) to encode uniqueness directly.

Not a blocker, but this would make the behavior easier to reason about for future readers.

server/src/internalClusterTest/java/org/opensearch/cluster/coordination/NodeJoinLeftIT.java (3)

140-151: Simplify messagesToCapture initialization to avoid double-brace idiom.

The anonymous ArrayList<String>() { { ... } } pattern works but creates an extra anonymous class and is harder to read.

You can express the same thing more simply and keep the list mutable:

-        List<String> messagesToCapture = new ArrayList<String>() {
-            {
-                add("failed to join");
-                add("IllegalStateException");
-            }
-        };
+        List<String> messagesToCapture = new ArrayList<>();
+        messagesToCapture.add("failed to join");
+        messagesToCapture.add("IllegalStateException");

This keeps compatibility with the later addMessagesToCapture calls while making the setup more straightforward.


440-467: Log assertions are strong; consider test runtime if logs don’t appear.

validateNodeDropDueToPublicationLag() thoroughly checks for:

  • The cluster dropping below 9 nodes,
  • The 30s delay log,
  • The node-removal task log with NodeRemovalClusterStateTaskExecutor and reason: lagging, and
  • The join failures.

Each waitForLog uses up to 30 seconds, and combined with the health wait of up to 120 seconds, the test could run for a while if something goes wrong. That’s acceptable for a heavy integ test, but if you ever see CI timeouts here, reducing some of these per-log timeouts (or sharing a common deadline) could help.

No change required now; just something to keep in mind if you tune test runtimes later.


492-502: Reuse shared delay constant for the 30s applier-thread sleep.

createDelayListener hardcodes 30 * 1000 while TestIndexStoreListener already defines SHARD_DELETE_DELAY_SECONDS = 30. For consistency and easier future tuning, consider reusing a shared constant (or at least a named constant in this class) instead of the literal:

-    private ClusterStateListener createDelayListener(ClusterApplierService applierService) {
-        return event -> applierService.runOnApplierThread("NodeJoinLeftIT", clusterState -> {
-            try {
-                logger.info("Sleeping for 30 seconds");
-                Thread.sleep(30 * 1000);
+    private static final int APPLIER_DELAY_SECONDS = 30;
+
+    private ClusterStateListener createDelayListener(ClusterApplierService applierService) {
+        return event -> applierService.runOnApplierThread("NodeJoinLeftIT", clusterState -> {
+            try {
+                logger.info("Sleeping for {} seconds", APPLIER_DELAY_SECONDS);
+                Thread.sleep(APPLIER_DELAY_SECONDS * 1000L);
             } catch (InterruptedException e) {
                 logger.info("Interrupted while waiting for cluster state applier");
                 Thread.currentThread().interrupt();
             }
         }, (source, e) -> logger.error(() -> new ParameterizedMessage("{} unexpected error in listener wait", source), e));
     }

Not functionally required, but it avoids magic numbers and keeps the delay configuration in one place.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8270f3a and 1ec8f19.

📒 Files selected for processing (4)
  • CHANGELOG.md (1 hunks)
  • server/src/internalClusterTest/java/org/opensearch/cluster/coordination/NodeJoinLeftIT.java (9 hunks)
  • server/src/internalClusterTest/java/org/opensearch/cluster/coordination/RequiresTestIndexStoreListener.java (1 hunks)
  • test/framework/src/main/java/org/opensearch/test/TestLogsAppender.java (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • CHANGELOG.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (22)
  • GitHub Check: gradle-check
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: assemble (21, macos-15)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: Analyze (java)
  • GitHub Check: detect-breaking-change
  • GitHub Check: Mend Security Check
🔇 Additional comments (6)
server/src/internalClusterTest/java/org/opensearch/cluster/coordination/RequiresTestIndexStoreListener.java (1)

16-21: Annotation design is appropriate and minimal.

Runtime retention and METHOD target are a good fit for gating the test listener; no issues here.

server/src/internalClusterTest/java/org/opensearch/cluster/coordination/NodeJoinLeftIT.java (5)

101-121: Annotation-based plugin gating looks good; reflection approach is reasonable.

Using RequiresTestIndexStoreListener plus requiresTestIndexStoreListener() to decide whether to add TestIndexStoreListenerPlugin is a clear improvement over string-based test-name matching and keeps the plugin test-scoped.

One small consideration: getClass().getMethod(getTestName()) assumes the test name corresponds exactly to a no-arg method on this class. That’s true for the current JUnit-style tests; if this class ever starts using parameterized or dynamically named tests, this logic may need revisiting. For now, it’s acceptable.


351-368: Static delay flag management is correct and safely cleaned up.

The combination of:

  • @RequiresTestIndexStoreListener to ensure the plugin is present for this test, and
  • TestIndexStoreListener.DELAY_SHARD_ASSIGNMENT guarded by a try/finally

ensures the static flag is always reset even if assertions fail. This addresses the earlier concern about static mutable state leaking between tests.

No changes needed here.


396-438: Lag-setup helper and log-capture wiring look coherent.

additionalSetupForLagDuringDataMigration() cleanly centralizes:

  • Cluster topology expansion (extra cluster-manager-only and data-only nodes),
  • Index creation, and
  • Transient settings for follower lag, publish timeout, and concurrent recoveries.

Coupling that with:

testLogsAppender.addMessagesToCapture(Set.of(
    "Sleeping for 30 seconds",
    "NodeRemovalClusterStateTaskExecutor",
    "reason: lagging"
));
testLogsAppender.clearCapturedLogs();

sets up exactly the messages asserted in validateNodeDropDueToPublicationLag() while starting from an empty captured-log buffer. Overall this helper is well-structured.

Given @ClusterScope(scope = Scope.TEST), the transient settings are naturally scoped to each test’s cluster instance, so explicit reset isn’t strictly necessary here.


469-490: Cluster recovery validation logic is clear and appropriate.

validateClusterRecovery():

  • Waits for GREEN with all 9 nodes present, with a 60s timeout,
  • Asserts no timeout,
  • Confirms the expected shard count (3), and
  • Verifies all shards are allocated to nodes with color == "red".

This neatly checks both cluster health and data placement after lag removal. Looks good.


558-602: Test-only IndexStorePlugin and listener are well-scoped and aligned with tests.

The nested TestIndexStoreListenerPlugin / TestIndexStoreListener setup:

  • Properly implements IndexStorePlugin#getIndexStoreListener using Optional,
  • Uses a static, volatile flag DELAY_SHARD_ASSIGNMENT gated from the test to control behavior,
  • Introduces a named SHARD_DELETE_DELAY_SECONDS constant, and
  • Logs both before and after the artificial delay, with correct interrupt handling.

Combined with the annotation-based plugin gating and the try/finally around DELAY_SHARD_ASSIGNMENT, this is a solid, test-scoped mechanism for simulating slow shard cleanup on the applier thread.

No changes needed here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 9, 2025

❌ Gradle check result for 1ec8f19: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@SwethaGuptha SwethaGuptha force-pushed the node-join-left-integ-test branch from 1ec8f19 to e7d4f6f Compare December 16, 2025 09:45
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
server/src/internalClusterTest/java/org/opensearch/cluster/coordination/NodeJoinLeftIT.java (1)

417-424: Ensure transient cluster settings are cleaned up.

The transient settings (follower lag timeout, publish timeout, concurrent recoveries) are set here but not explicitly cleaned up. While the test framework may reset these between test classes, explicit cleanup is safer to prevent test pollution.

Consider adding a cleanup method or capturing and restoring the original values:

+    private Settings originalTransientSettings;
+
     private void additionalSetupForLagDuringDataMigration() {
+        // Capture current settings
+        originalTransientSettings = internalCluster().client().admin().cluster().prepareState().get()
+            .getState().metadata().transientSettings();
+        
         // ... existing setup code ...

Then in tearDown():

     @After
     public void tearDown() throws Exception {
+        if (originalTransientSettings != null) {
+            ClusterUpdateSettingsRequest resetRequest = new ClusterUpdateSettingsRequest();
+            resetRequest.transientSettings(Settings.builder().put(originalTransientSettings).build());
+            internalCluster().client().admin().cluster().updateSettings(resetRequest).actionGet();
+        }
         testLogsAppender.clearCapturedLogs();
         // ... existing teardown code ...
🧹 Nitpick comments (2)
server/src/internalClusterTest/java/org/opensearch/cluster/coordination/NodeJoinLeftIT.java (2)

145-150: Consider using Arrays.asList or List.of for initialization.

The double-brace initialization creates an anonymous inner class, which adds slight overhead. For cleaner code, consider:

-        List<String> messagesToCapture = new ArrayList<String>() {
-            {
-                add("failed to join");
-                add("IllegalStateException");
-            }
-        };
+        List<String> messagesToCapture = new ArrayList<>(Arrays.asList(
+            "failed to join",
+            "IllegalStateException"
+        ));

494-504: Consider extracting the delay value to a constant.

The hardcoded 30 * 1000 on line 498 should be extracted to a named constant for consistency with TestIndexStoreListener.SHARD_DELETE_DELAY_SECONDS and better maintainability.

+    private static final int APPLIER_DELAY_SECONDS = 30;
+
     private ClusterStateListener createDelayListener(ClusterApplierService applierService) {
         return event -> applierService.runOnApplierThread("NodeJoinLeftIT", clusterState -> {
             try {
-                logger.info("Sleeping for 30 seconds");
-                Thread.sleep(30 * 1000);
+                logger.info("Sleeping for {} seconds", APPLIER_DELAY_SECONDS);
+                Thread.sleep(APPLIER_DELAY_SECONDS * 1000);
             } catch (InterruptedException e) {
                 logger.info("Interrupted while waiting for cluster state applier");
                 Thread.currentThread().interrupt();
             }
         }, (source, e) -> logger.error(() -> new ParameterizedMessage("{} unexpected error in listener wait", source), e));
     }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1ec8f19 and e7d4f6f.

📒 Files selected for processing (4)
  • CHANGELOG.md (1 hunks)
  • server/src/internalClusterTest/java/org/opensearch/cluster/coordination/NodeJoinLeftIT.java (9 hunks)
  • server/src/internalClusterTest/java/org/opensearch/cluster/coordination/RequiresTestIndexStoreListener.java (1 hunks)
  • test/framework/src/main/java/org/opensearch/test/TestLogsAppender.java (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • CHANGELOG.md
  • test/framework/src/main/java/org/opensearch/test/TestLogsAppender.java
🧰 Additional context used
🧬 Code graph analysis (1)
server/src/internalClusterTest/java/org/opensearch/cluster/coordination/NodeJoinLeftIT.java (1)
server/src/main/java/org/opensearch/cluster/coordination/FollowersChecker.java (1)
  • FollowersChecker (84-537)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
  • GitHub Check: gradle-check
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: detect-breaking-change
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: Analyze (java)
  • GitHub Check: Mend Security Check
🔇 Additional comments (7)
server/src/internalClusterTest/java/org/opensearch/cluster/coordination/RequiresTestIndexStoreListener.java (1)

1-22: LGTM!

The annotation definition is clean and follows standard patterns for runtime marker annotations. The javadoc clearly explains its purpose.

server/src/internalClusterTest/java/org/opensearch/cluster/coordination/NodeJoinLeftIT.java (6)

108-130: LGTM!

The annotation-based plugin loading mechanism is well-implemented. Using reflection to detect the @RequiresTestIndexStoreListener annotation at runtime is appropriate for test infrastructure and addresses the previous concern about fragile string-based matching.


351-368: LGTM!

The try-finally pattern ensures DELAY_SHARD_ASSIGNMENT is always reset, preventing test pollution. The explicit reset on line 363 before the finally block appears redundant but is acceptable for clarity.


370-396: LGTM!

The try-finally pattern ensures cluster state listeners are always removed, even if validation fails. This prevents the 30-second delays from leaking into subsequent tests and addresses the previous concern about resource cleanup.


442-469: LGTM!

The validation logic comprehensively checks for expected cluster behavior: node drops, lag-induced delays, and join failures. The timeout values are reasonable for integration tests.


471-492: LGTM!

The recovery validation ensures the cluster returns to a healthy state and all shards are migrated to red nodes as expected. The assertions provide good coverage of the recovery scenario.


560-604: LGTM!

The test plugin and listener implementations are well-structured:

  • Correct use of volatile for thread-safe flag access
  • Named constant for delay duration improves maintainability
  • Proper interrupt status restoration in the exception handler
  • Clean integration with OpenSearch's IndexStorePlugin interface

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for e7d4f6f: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@SwethaGuptha SwethaGuptha force-pushed the node-join-left-integ-test branch from e7d4f6f to fa2a531 Compare December 18, 2025 12:24
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
server/src/internalClusterTest/java/org/opensearch/cluster/coordination/NodeJoinLeftIT.java (1)

417-424: Transient cluster settings cleanup.

While @ClusterScope(scope = Scope.TEST) should recreate the cluster per test method (mitigating leakage risk), consider explicitly resetting these transient settings in the finally block of the calling test methods or in tearDown() for defensive coding.

The cluster scope is Scope.TEST, which recreates the cluster per test. Please confirm this guarantees settings isolation, or add explicit cleanup as a safeguard.

🧹 Nitpick comments (1)
server/src/internalClusterTest/java/org/opensearch/cluster/coordination/NodeJoinLeftIT.java (1)

145-150: Prefer new ArrayList<>(List.of(...)) over double-brace initialization.

Double-brace initialization creates an anonymous inner class that holds a reference to the enclosing instance, which can cause memory leaks and increases bytecode size.

🔎 Suggested fix:
-        List<String> messagesToCapture = new ArrayList<String>() {
-            {
-                add("failed to join");
-                add("IllegalStateException");
-            }
-        };
+        List<String> messagesToCapture = new ArrayList<>(List.of(
+            "failed to join",
+            "IllegalStateException"
+        ));
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e7d4f6f and fa2a531.

📒 Files selected for processing (4)
  • CHANGELOG.md (1 hunks)
  • server/src/internalClusterTest/java/org/opensearch/cluster/coordination/NodeJoinLeftIT.java (9 hunks)
  • server/src/internalClusterTest/java/org/opensearch/cluster/coordination/RequiresTestIndexStoreListener.java (1 hunks)
  • test/framework/src/main/java/org/opensearch/test/TestLogsAppender.java (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • CHANGELOG.md
  • test/framework/src/main/java/org/opensearch/test/TestLogsAppender.java
🧰 Additional context used
🧬 Code graph analysis (1)
server/src/internalClusterTest/java/org/opensearch/cluster/coordination/NodeJoinLeftIT.java (1)
server/src/main/java/org/opensearch/cluster/coordination/FollowersChecker.java (1)
  • FollowersChecker (84-537)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: gradle-check
  • GitHub Check: detect-breaking-change
  • GitHub Check: Analyze (java)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: Mend Security Check
🔇 Additional comments (8)
server/src/internalClusterTest/java/org/opensearch/cluster/coordination/RequiresTestIndexStoreListener.java (1)

1-22: LGTM!

The annotation is properly defined with RUNTIME retention (required for reflection-based detection in requiresTestIndexStoreListener()) and METHOD target. Clean and minimal implementation.

server/src/internalClusterTest/java/org/opensearch/cluster/coordination/NodeJoinLeftIT.java (7)

107-130: LGTM!

The annotation-based approach for conditional plugin registration is well-implemented. The reflection logic correctly retrieves the test method and checks for @RequiresTestIndexStoreListener. Returning false on NoSuchMethodException is a safe default.


351-368: LGTM!

The try-finally block ensures DELAY_SHARD_ASSIGNMENT is reset even if the test fails. The annotation-based gating with @RequiresTestIndexStoreListener is properly applied.


370-396: LGTM!

The try-finally block ensures listeners are always removed, preventing pollution of subsequent tests. The validateClusterRecovery() call after cleanup is appropriately placed.


442-469: LGTM!

The validation method comprehensively verifies:

  1. Node drop occurred (cluster size < 9)
  2. Delay in shard cleanup was triggered (log verification)
  3. Node removal due to publication lag (log verification)
  4. Join request failures with expected exception

471-492: LGTM!

The recovery validation properly verifies:

  1. Cluster returns to green status with all 9 nodes
  2. All 3 shards are active (index-1: 1P+1R, test: 1P)
  3. All shards migrated to red nodes

494-504: LGTM!

The delay listener correctly:

  • Restores interrupt status with Thread.currentThread().interrupt()
  • Uses source (not e) in the ParameterizedMessage
  • Schedules work on the applier thread to simulate blocking

560-604: LGTM!

The plugin and listener implementation:

  • Uses a named constant SHARD_DELETE_DELAY_SECONDS instead of magic number
  • Properly restores interrupt status in the catch block
  • Uses volatile for thread-safe visibility of DELAY_SHARD_ASSIGNMENT
  • Includes informative logging before and after the delay

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for fa2a531: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

…de cluster state publication lag because the cluster applier thread being busy.

Signed-off-by: Swetha Guptha <gupthasg@amazon.com>
@SwethaGuptha SwethaGuptha force-pushed the node-join-left-integ-test branch from fa2a531 to 59ddad9 Compare December 18, 2025 16:33
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
test/framework/src/main/java/org/opensearch/test/TestLogsAppender.java (1)

35-37: Variable shadowing: rename parameter to avoid confusion.

The parameter messagesToCapture shadows the instance field of the same name, which reduces code clarity.

🔎 Apply this diff to rename the parameter:
-public void addMessagesToCapture(Set<String> messagesToCapture) {
-    messagesToCapture.forEach(this::addMessageToCapture);
+public void addMessagesToCapture(Set<String> messages) {
+    messages.forEach(this::addMessageToCapture);
 }
server/src/internalClusterTest/java/org/opensearch/cluster/coordination/NodeJoinLeftIT.java (2)

418-425: Consider explicit cleanup of transient cluster settings.

The transient cluster settings (follower_lag.timeout, publish.timeout, cluster_concurrent_recoveries) are modified here but not explicitly reset. While the test framework likely handles cleanup between tests, explicitly resetting these settings in a finally block or tearDown() method would be more robust and prevent potential test pollution.

🔎 Consider adding explicit cleanup:

You could capture the original values before modification and restore them in a finally block within each test method that calls this helper, or add a corresponding cleanup helper method that resets these to defaults.

Example approach:

private void cleanupClusterSettings() {
    ClusterUpdateSettingsRequest resetRequest = new ClusterUpdateSettingsRequest();
    Settings resetSettings = Settings.builder()
        .putNull("cluster.follower_lag.timeout")
        .putNull("cluster.publish.timeout")
        .putNull("cluster.routing.allocation.cluster_concurrent_recoveries")
        .build();
    resetRequest.transientSettings(resetSettings);
    internalCluster().client().admin().cluster().updateSettings(resetRequest).actionGet();
}

Then call this in the finally blocks of the test methods after removing listeners.


497-507: Consider extracting the 30-second delay to a constant.

The delay value is hardcoded here as 30 * 1000 (line 501), while TestIndexStoreListener uses a named constant SHARD_DELETE_DELAY_SECONDS = 30 (line 573). For consistency and maintainability, consider extracting this to a constant or reusing the existing one.

🔎 Apply this diff to improve consistency:

At class level, add:

+    private static final int APPLIER_DELAY_SECONDS = 30;

Then update the sleep:

-                logger.info("Sleeping for 30 seconds");
-                Thread.sleep(30 * 1000);
+                logger.info("Sleeping for {} seconds", APPLIER_DELAY_SECONDS);
+                Thread.sleep(APPLIER_DELAY_SECONDS * 1000);

Note: The interrupt handling and error callback are correctly implemented per previous review feedback.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fa2a531 and 59ddad9.

📒 Files selected for processing (4)
  • CHANGELOG.md (1 hunks)
  • server/src/internalClusterTest/java/org/opensearch/cluster/coordination/NodeJoinLeftIT.java (8 hunks)
  • server/src/internalClusterTest/java/org/opensearch/cluster/coordination/RequiresTestIndexStoreListener.java (1 hunks)
  • test/framework/src/main/java/org/opensearch/test/TestLogsAppender.java (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • server/src/internalClusterTest/java/org/opensearch/cluster/coordination/RequiresTestIndexStoreListener.java
  • CHANGELOG.md
🧰 Additional context used
🧬 Code graph analysis (1)
server/src/internalClusterTest/java/org/opensearch/cluster/coordination/NodeJoinLeftIT.java (1)
server/src/main/java/org/opensearch/cluster/coordination/FollowersChecker.java (1)
  • FollowersChecker (84-537)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (22)
  • GitHub Check: gradle-check
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (21, macos-15)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: Analyze (java)
  • GitHub Check: detect-breaking-change
  • GitHub Check: Mend Security Check
🔇 Additional comments (9)
server/src/internalClusterTest/java/org/opensearch/cluster/coordination/NodeJoinLeftIT.java (9)

36-86: LGTM! Import additions support the new test infrastructure.

The added imports are necessary for the cluster state listener functionality, IndexStorePlugin integration, and improved logging with ParameterizedMessage.


108-130: LGTM! Annotation-based approach improves maintainability.

The reflection-based mechanism properly addresses the previous concern about fragile string matching. This approach is more robust and survives test method renames.


145-150: LGTM! Modifiable collection enables dynamic log message capture.

The switch to ArrayList allows the test framework to add additional messages to capture at line 439, which is necessary for the lag simulation tests.


159-169: LGTM! Field extraction enables settings reuse.

Extracting nodeSettings to a field allows consistent configuration across multiple node startups in the test methods.


351-368: LGTM! Try-finally ensures proper cleanup of static test state.

The test properly uses try-finally to guarantee that DELAY_SHARD_ASSIGNMENT is reset even if the test fails, preventing pollution of subsequent tests. This addresses the critical concern from previous reviews.


370-397: LGTM! Listener lifecycle properly managed with try-finally.

The test ensures that long-running cluster state listeners are always removed in the finally block, even if validation fails. This prevents the 30-second delays from affecting subsequent tests. Well done addressing the critical concern from previous reviews.


443-470: LGTM! Validation logic properly verifies publication lag scenario.

The method correctly waits for node drops and validates the expected log sequence: delay introduction, node removal due to lag, and join request failures. The 120-second timeout provides adequate buffer for the configured lag timeouts (5s + 15s = 20s) and the 30-second delay.


472-495: LGTM! Recovery validation ensures cluster stabilizes and shards migrate correctly.

The method properly validates that:

  1. The cluster recovers to green status with all 9 nodes
  2. All 3 shards are active (1 shard from "test" index + 2 shards from "index-1")
  3. All shards successfully migrated to red nodes

562-604: LGTM! Test infrastructure properly implements IndexStoreListener with correct interrupt handling.

The implementation correctly:

  • Uses a named constant for the delay duration
  • Restores thread interrupt status in the catch block (line 597)
  • Logs comprehensive diagnostic information including thread name and timing
  • Uses volatile boolean for cross-thread visibility of the DELAY_SHARD_ASSIGNMENT flag

All critical concerns from previous reviews have been properly addressed.

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 59ddad9: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: shwetathareja <shwetathareja@live.com>
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 85da046: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Copy Markdown
Contributor

@rajiv-kv rajiv-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes.

…ster state publication lag because the cluster applier thread being busy.

Signed-off-by: Swetha Guptha <gupthasg@amazon.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/src/internalClusterTest/java/org/opensearch/cluster/coordination/NodeJoinLeftIT.java (1)

162-169: Duplicate setting detected.

NodeConnectionsService.CLUSTER_NODE_RECONNECT_INTERVAL_SETTING is configured twice with different values: "10s" at line 164 and "100ms" at line 168. The second value overwrites the first, making line 164 ineffective dead code.

🔎 Proposed fix
         this.nodeSettings = Settings.builder()
             .put(RecoverySettings.INDICES_RECOVERY_RETRY_DELAY_NETWORK_SETTING.getKey(), "100ms")
-            .put(NodeConnectionsService.CLUSTER_NODE_RECONNECT_INTERVAL_SETTING.getKey(), "10s")
             .put(FollowersChecker.FOLLOWER_CHECK_TIMEOUT_SETTING.getKey(), "200ms")
             .put(FollowersChecker.FOLLOWER_CHECK_INTERVAL_SETTING.getKey(), "100ms")
             .put(FollowersChecker.FOLLOWER_CHECK_RETRY_COUNT_SETTING.getKey(), 1)
             .put(NodeConnectionsService.CLUSTER_NODE_RECONNECT_INTERVAL_SETTING.getKey(), "100ms")
             .build();
🧹 Nitpick comments (1)
server/src/internalClusterTest/java/org/opensearch/cluster/coordination/NodeJoinLeftIT.java (1)

432-446: Consider extracting magic numbers to named constants.

The delay values (3 seconds at line 437, 30 seconds at line 510) are hardcoded. While SHARD_DELETE_DELAY_SECONDS is properly defined as a constant in TestIndexStoreListener, similar constants could improve readability for these delays.

🔎 Optional improvement
private static final int CM_APPLIER_DELAY_SECONDS = 3;
private static final int LISTENER_DELAY_SECONDS = 30;
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 85da046 and 131aa8d.

📒 Files selected for processing (1)
  • server/src/internalClusterTest/java/org/opensearch/cluster/coordination/NodeJoinLeftIT.java
🧰 Additional context used
🧬 Code graph analysis (1)
server/src/internalClusterTest/java/org/opensearch/cluster/coordination/NodeJoinLeftIT.java (1)
server/src/main/java/org/opensearch/cluster/coordination/FollowersChecker.java (1)
  • FollowersChecker (84-537)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: gradle-check
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: detect-breaking-change
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: Analyze (java)
🔇 Additional comments (4)
server/src/internalClusterTest/java/org/opensearch/cluster/coordination/NodeJoinLeftIT.java (4)

111-133: Annotation-based plugin loading is a good improvement.

The use of @RequiresTestIndexStoreListener annotation with reflection-based detection is a clean solution that addresses the previous concern about string-based test name matching. This approach is more maintainable and won't break if tests are renamed.


354-400: LGTM! Proper resource cleanup with try-finally.

Both new test methods correctly use try-finally blocks to ensure cleanup:

  • testClusterStabilityWhenClusterStatePublicationLagsOnShardCleanup resets DELAY_SHARD_ASSIGNMENT in finally
  • testClusterStabilityWhenClusterStatePublicationLagsWithLongRunningListenerOnApplierThread removes listeners in finally

This addresses the previous concerns about static state pollution and listener leaks.


480-503: Validation logic looks correct.

The validateClusterRecovery method properly:

  1. Waits for green status with all 9 nodes
  2. Verifies exactly 3 active shards (1 from "test" index + 2 from "index-1")
  3. Confirms all shards migrated to red nodes via cluster state inspection

This addresses the previous review request to assert shard migration and data integrity.


574-622: LGTM! Plugin and listener implementation addresses all previous concerns.

The implementation correctly:

  1. Uses a named constant SHARD_DELETE_DELAY_SECONDS instead of magic number
  2. Restores interrupt status with Thread.currentThread().interrupt() in catch blocks
  3. Properly shuts down the ScheduledExecutorService after use
  4. Uses volatile for thread-safe access to DELAY_SHARD_ASSIGNMENT

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 131aa8d: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for 131aa8d: SUCCESS

@codecov
Copy link
Copy Markdown

codecov bot commented Dec 24, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.23%. Comparing base (84bc68e) to head (131aa8d).
⚠️ Report is 11 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #19907      +/-   ##
============================================
+ Coverage     73.21%   73.23%   +0.01%     
+ Complexity    71776    71750      -26     
============================================
  Files          5795     5795              
  Lines        328304   328304              
  Branches      47281    47281              
============================================
+ Hits         240374   240432      +58     
+ Misses        68684    68540     -144     
- Partials      19246    19332      +86     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@shwetathareja shwetathareja merged commit 5e80506 into opensearch-project:main Dec 29, 2025
45 of 60 checks passed
mohit10011999 pushed a commit to mohit10011999/OpenSearch that referenced this pull request Dec 30, 2025
opensearch-project#19907)

* Add integration test for simulating node join left event when data node cluster state publication lag because the cluster applier thread being busy.

Signed-off-by: Swetha Guptha <gupthasg@amazon.com>
mohit10011999 pushed a commit to mohit10011999/OpenSearch that referenced this pull request Dec 30, 2025
opensearch-project#19907)

* Add integration test for simulating node join left event when data node cluster state publication lag because the cluster applier thread being busy.

Signed-off-by: Swetha Guptha <gupthasg@amazon.com>
mohit10011999 pushed a commit to mohit10011999/OpenSearch that referenced this pull request Dec 30, 2025
opensearch-project#19907)

* Add integration test for simulating node join left event when data node cluster state publication lag because the cluster applier thread being busy.

Signed-off-by: Swetha Guptha <gupthasg@amazon.com>
Signed-off-by: Mohit Kumar <mohitamg@amazon.com>
mohit10011999 pushed a commit to mohit10011999/OpenSearch that referenced this pull request Dec 30, 2025
opensearch-project#19907)

* Add integration test for simulating node join left event when data node cluster state publication lag because the cluster applier thread being busy.

Signed-off-by: Swetha Guptha <gupthasg@amazon.com>
mohit10011999 pushed a commit to mohit10011999/OpenSearch that referenced this pull request Dec 30, 2025
opensearch-project#19907)

* Add integration test for simulating node join left event when data node cluster state publication lag because the cluster applier thread being busy.

Signed-off-by: Swetha Guptha <gupthasg@amazon.com>
Signed-off-by: Mohit Kumar <mohitamg@amazon.com>
mohit10011999 pushed a commit to mohit10011999/OpenSearch that referenced this pull request Dec 30, 2025
opensearch-project#19907)

* Add integration test for simulating node join left event when data node cluster state publication lag because the cluster applier thread being busy.

Signed-off-by: Swetha Guptha <gupthasg@amazon.com>
Signed-off-by: Mohit Kumar <mohitamg@amazon.com>
@coderabbitai coderabbitai bot mentioned this pull request Jan 7, 2026
3 tasks
bharath-techie pushed a commit that referenced this pull request Jan 19, 2026
* Add preserve_dots paramater to ObjectMapper

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Extend RootObjectMapper to support preserve_dots

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Implement flat field parsing in DocumentParser

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Implement validation in MapperService

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Add index template support for preserve_dots tests

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* error handling with descriptive messages

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Removing redundancy

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Adding more edge cases, autoflatten and test cases

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Adding more edge cases, autoflatten and test cases

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Adding autoflattening at nested level as well

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Fixing gradle spotless checks

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Stab at optimizing the code via removing redundant code and fixing multiple functions

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Removing splitPathRespectingDisableObjects and trying to achieve it through existing getMapper function

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Reduction of if else statements by extracting conditional logic into focused methods

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Reduction of if else statements by extracting conditional logic into focused methods

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Reduction of if else statements by extracting conditional logic into focused methods

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Reduction of if else statements by extracting conditional logic into focused methods

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Reduction of if else statements by extracting conditional logic into focused methods

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Reduction of if else statements by extracting conditional logic into focused methods

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Reduction of if else statements by extracting conditional logic into focused methods

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Revert "Reduction of if else statements by extracting conditional logic into focused methods"

This reverts commit 9ca7e6d.

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* [Plugins] Relax jar hell check when extended plugins share transitive dependencies (#20103)

* plugin service allow multiple extended plugins both extend the same module/plugin

Signed-off-by: Karen X <karenxyr@gmail.com>

* fix tests

Signed-off-by: Karen X <karenxyr@gmail.com>

---------

Signed-off-by: Karen X <karenxyr@gmail.com>
Co-authored-by: karenx <karenx@uber.com>

* Reverting change in CHANGELOG.md

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Add integ test for simulating node join left event when data node clu… (#19907)

* Add integration test for simulating node join left event when data node cluster state publication lag because the cluster applier thread being busy.

Signed-off-by: Swetha Guptha <gupthasg@amazon.com>
Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Adding few more UTs in DocumentParserTests and ObjectMapperTests for code patch coverage

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Adding changes in CHANGELOG.md

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Adding more tests for code patch coverage for DocumentParser and ObjectMapper

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow1

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow2

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow3

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Adding more tests for code patch coverage for DocumentParser and ObjectMapper second round and refactoring of code especially dynamic update portion

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Removing dead end code from Document Parser related to parseArray and parseObject

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow4

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow4

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow5

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow6

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow6

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow7

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow8

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Fixing breaking changes in ObjectMapper builder method

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Fixing spotless checks

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow9

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow9

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Handling v2 templates for dotted field feature and stab at using existing methods in switch case

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Handling v2 templates for dotted field feature and stab at using existing methods in switch case

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Fixing org.opensearch.indices.template.SimpleIndexTemplateIT IT

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Fixing array mapping  with nested disable_objects as true

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Addressed review comments and removed redundant code

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

---------

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>
Signed-off-by: mohit10011999 <pinkijyoti1995@gmail.com>
Signed-off-by: Karen X <karenxyr@gmail.com>
Signed-off-by: Swetha Guptha <gupthasg@amazon.com>
Co-authored-by: Mohit Kumar <mohitamg@amazon.com>
Co-authored-by: Karen X <karenxyr@gmail.com>
Co-authored-by: karenx <karenx@uber.com>
Co-authored-by: Swetha Guptha <gupthasg@amazon.com>
tanik98 pushed a commit to tanik98/OpenSearch that referenced this pull request Feb 18, 2026
opensearch-project#19907)

* Add integration test for simulating node join left event when data node cluster state publication lag because the cluster applier thread being busy.

Signed-off-by: Swetha Guptha <gupthasg@amazon.com>
tanyabti pushed a commit to tanyabti/OpenSearch that referenced this pull request Feb 24, 2026
…rch-project#19958)

* Add preserve_dots paramater to ObjectMapper

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Extend RootObjectMapper to support preserve_dots

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Implement flat field parsing in DocumentParser

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Implement validation in MapperService

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Add index template support for preserve_dots tests

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* error handling with descriptive messages

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Removing redundancy

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Adding more edge cases, autoflatten and test cases

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Adding more edge cases, autoflatten and test cases

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Adding autoflattening at nested level as well

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Fixing gradle spotless checks

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Stab at optimizing the code via removing redundant code and fixing multiple functions

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Removing splitPathRespectingDisableObjects and trying to achieve it through existing getMapper function

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Reduction of if else statements by extracting conditional logic into focused methods

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Reduction of if else statements by extracting conditional logic into focused methods

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Reduction of if else statements by extracting conditional logic into focused methods

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Reduction of if else statements by extracting conditional logic into focused methods

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Reduction of if else statements by extracting conditional logic into focused methods

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Reduction of if else statements by extracting conditional logic into focused methods

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Reduction of if else statements by extracting conditional logic into focused methods

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Revert "Reduction of if else statements by extracting conditional logic into focused methods"

This reverts commit 9ca7e6d.

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* [Plugins] Relax jar hell check when extended plugins share transitive dependencies (opensearch-project#20103)

* plugin service allow multiple extended plugins both extend the same module/plugin

Signed-off-by: Karen X <karenxyr@gmail.com>

* fix tests

Signed-off-by: Karen X <karenxyr@gmail.com>

---------

Signed-off-by: Karen X <karenxyr@gmail.com>
Co-authored-by: karenx <karenx@uber.com>

* Reverting change in CHANGELOG.md

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Add integ test for simulating node join left event when data node clu… (opensearch-project#19907)

* Add integration test for simulating node join left event when data node cluster state publication lag because the cluster applier thread being busy.

Signed-off-by: Swetha Guptha <gupthasg@amazon.com>
Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Adding few more UTs in DocumentParserTests and ObjectMapperTests for code patch coverage

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Adding changes in CHANGELOG.md

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Adding more tests for code patch coverage for DocumentParser and ObjectMapper

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow1

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow2

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow3

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Adding more tests for code patch coverage for DocumentParser and ObjectMapper second round and refactoring of code especially dynamic update portion

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Removing dead end code from Document Parser related to parseArray and parseObject

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow4

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow4

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow5

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow6

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow6

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow7

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow8

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Fixing breaking changes in ObjectMapper builder method

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Fixing spotless checks

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow9

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow9

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Handling v2 templates for dotted field feature and stab at using existing methods in switch case

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Handling v2 templates for dotted field feature and stab at using existing methods in switch case

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Fixing org.opensearch.indices.template.SimpleIndexTemplateIT IT

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Fixing array mapping  with nested disable_objects as true

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Addressed review comments and removed redundant code

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

---------

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>
Signed-off-by: mohit10011999 <pinkijyoti1995@gmail.com>
Signed-off-by: Karen X <karenxyr@gmail.com>
Signed-off-by: Swetha Guptha <gupthasg@amazon.com>
Co-authored-by: Mohit Kumar <mohitamg@amazon.com>
Co-authored-by: Karen X <karenxyr@gmail.com>
Co-authored-by: karenx <karenx@uber.com>
Co-authored-by: Swetha Guptha <gupthasg@amazon.com>
tanyabti pushed a commit to tanyabti/OpenSearch that referenced this pull request Feb 24, 2026
…rch-project#19958)

* Add preserve_dots paramater to ObjectMapper

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Extend RootObjectMapper to support preserve_dots

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Implement flat field parsing in DocumentParser

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Implement validation in MapperService

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Add index template support for preserve_dots tests

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* error handling with descriptive messages

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Removing redundancy

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Adding more edge cases, autoflatten and test cases

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Adding more edge cases, autoflatten and test cases

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Adding autoflattening at nested level as well

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Fixing gradle spotless checks

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Stab at optimizing the code via removing redundant code and fixing multiple functions

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Removing splitPathRespectingDisableObjects and trying to achieve it through existing getMapper function

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Reduction of if else statements by extracting conditional logic into focused methods

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Reduction of if else statements by extracting conditional logic into focused methods

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Reduction of if else statements by extracting conditional logic into focused methods

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Reduction of if else statements by extracting conditional logic into focused methods

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Reduction of if else statements by extracting conditional logic into focused methods

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Reduction of if else statements by extracting conditional logic into focused methods

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Reduction of if else statements by extracting conditional logic into focused methods

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Revert "Reduction of if else statements by extracting conditional logic into focused methods"

This reverts commit 9ca7e6d.

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* [Plugins] Relax jar hell check when extended plugins share transitive dependencies (opensearch-project#20103)

* plugin service allow multiple extended plugins both extend the same module/plugin

Signed-off-by: Karen X <karenxyr@gmail.com>

* fix tests

Signed-off-by: Karen X <karenxyr@gmail.com>

---------

Signed-off-by: Karen X <karenxyr@gmail.com>
Co-authored-by: karenx <karenx@uber.com>

* Reverting change in CHANGELOG.md

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Add integ test for simulating node join left event when data node clu… (opensearch-project#19907)

* Add integration test for simulating node join left event when data node cluster state publication lag because the cluster applier thread being busy.

Signed-off-by: Swetha Guptha <gupthasg@amazon.com>
Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Adding few more UTs in DocumentParserTests and ObjectMapperTests for code patch coverage

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Adding changes in CHANGELOG.md

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Adding more tests for code patch coverage for DocumentParser and ObjectMapper

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow1

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow2

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow3

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Adding more tests for code patch coverage for DocumentParser and ObjectMapper second round and refactoring of code especially dynamic update portion

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Removing dead end code from Document Parser related to parseArray and parseObject

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow4

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow4

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow5

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow6

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow6

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow7

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow8

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Fixing breaking changes in ObjectMapper builder method

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Fixing spotless checks

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow9

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Trigger workflow9

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Handling v2 templates for dotted field feature and stab at using existing methods in switch case

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Handling v2 templates for dotted field feature and stab at using existing methods in switch case

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Fixing org.opensearch.indices.template.SimpleIndexTemplateIT IT

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Fixing array mapping  with nested disable_objects as true

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

* Addressed review comments and removed redundant code

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>

---------

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>
Signed-off-by: mohit10011999 <pinkijyoti1995@gmail.com>
Signed-off-by: Karen X <karenxyr@gmail.com>
Signed-off-by: Swetha Guptha <gupthasg@amazon.com>
Co-authored-by: Mohit Kumar <mohitamg@amazon.com>
Co-authored-by: Karen X <karenxyr@gmail.com>
Co-authored-by: karenx <karenx@uber.com>
Co-authored-by: Swetha Guptha <gupthasg@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants