Add adaptive merge policy to reduce benchmark variance by rgsriram · Pull Request #19352 · opensearch-project/OpenSearch

rgsriram · 2025-09-21T17:47:03Z

Description

Introduced an adaptive merge policy system in OpenSearch that dynamically tunes Lucene segment merge settings based on shard size. The solution replaces static defaults with context-aware configurations, improving benchmark consistency and search performance across varying index sizes. Key additions include an AdaptiveTieredMergePolicyProvider for shard-specific optimisations, a SegmentTopologyAnalyzer for segment distribution analysis, and a REST API for real-time monitoring. This change delivers more predictable performance and optimised search behaviour for users.

Related Issues

Resolves #11163

Check List

Functionality includes testing.
API changes companion pull request created, if applicable.
Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

New Features
- Adaptive merge policy that dynamically tailors merge settings by shard size and provides automated recommendations.
- Segment topology analyzer and calculator to compute segment metrics and produce merge recommendations.
Documentation
- Changelog entry added and package-level admin REST documentation included.
Tests
- New unit tests, test utilities, and benchmarks validating topology analysis and adaptive policy recommendations.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

github-actions · 2025-09-21T18:29:01Z

❌ Gradle check result for 2b398ff: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-09-22T06:03:52Z

❌ Gradle check result for abe1e49: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-09-22T11:49:58Z

❌ Gradle check result for 5076ee5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

server/src/main/java/org/opensearch/index/AdaptiveTieredMergePolicyProvider.java (2)
26-29: Documentation describes implementation details not present in this class.

The javadoc mentions "smooth interpolation (log/linear across size decades)" (lines 27-29), but this logic doesn't exist in AdaptiveTieredMergePolicyProvider. The class simply initializes a wrapped AdaptiveOpenSearchTieredMergePolicy with static defaults. If the interpolation logic resides in the wrapped policy, update the documentation to clarify that this provider delegates adaptive behavior to AdaptiveOpenSearchTieredMergePolicy, rather than implying this class performs the computation.
📝 Suggested documentation clarification
  * Implementation notes:
- * - Uses smooth interpolation (log/linear across size decades) instead of hard
- * categories
- * to avoid abrupt parameter jumps as shards grow.
+ * - Delegates to AdaptiveOpenSearchTieredMergePolicy which uses smooth 
+ * interpolation (log/linear across size decades) to avoid abrupt parameter 
+ * jumps as shards grow.
  * - Caps the max merged segment size at 5GB to align with Lucene defaults.
59-77: Consider acquiring settingsLock when mutating policy parameters.

applyDefaultSettings() modifies tieredMergePolicy state (lines 65–76) without acquiring settingsLock. While this is currently safe because the method is only called from the constructor (before the object is published), future refactorings or concurrent access patterns could introduce races. For consistency and defensive coding, consider wrapping the setter calls in a write lock:
🔒 Proposed lock usage
 private void applyDefaultSettings() {
+    settingsLock.writeLock().lock();
+    try {
         // Fallback to the original default settings, ensuring parity with
         // TieredMergePolicyProvider
         // We use explicit values here, but they match the constants in
         // TieredMergePolicyProvider:
         // DEFAULT_MAX_MERGED_SEGMENT = 5GB
         tieredMergePolicy.setMaxMergedSegmentMB(5 * 1024);
         // DEFAULT_FLOOR_SEGMENT = 16MB
         tieredMergePolicy.setFloorSegmentMB(16);
         // DEFAULT_SEGMENTS_PER_TIER = 10.0
         tieredMergePolicy.setSegmentsPerTier(10.0);
         // DEFAULT_MAX_MERGE_AT_ONCE = 30
         tieredMergePolicy.setMaxMergeAtOnce(30);
         // DEFAULT_EXPUNGE_DELETES_ALLOWED = 10.0
         tieredMergePolicy.setForceMergeDeletesPctAllowed(10.0);
         // DEFAULT_DELETES_PCT_ALLOWED = 20.0
         tieredMergePolicy.setDeletesPctAllowed(20.0);
         tieredMergePolicy.setNoCFSRatio(TieredMergePolicy.DEFAULT_NO_CFS_RATIO);
+    } finally {
+        settingsLock.writeLock().unlock();
+    }
 }
server/src/test/java/org/opensearch/index/SegmentTopologyBenchmarkTests.java (1)
246-262: Consider tracking segments by identity rather than size value.

Line 253 uses trackedSizes.remove(Long.valueOf(info.sizeInBytes())), which removes by object equality. If multiple segments have identical sizes, this will only remove the first matching value, potentially causing tracking drift. While collisions may be rare due to the random noise added at line 221, using a different data structure (e.g., parallel tracking with indices or a Map) would be more robust.
🔎 Alternative approach using indexed tracking

Replace trackedSizes List with a Map that tracks segment name to size:
-    List<Long> trackedSegmentSizes = new ArrayList<>();
+    Map<String, Long> trackedSegmentSizes = new HashMap<>();
     // ...
     for (int i = 0; i < 5; i++) {
         long size = 50L * 1024 * 1024;
         String name = "_init_" + i;
         segmentInfos.add(createSegmentCommitInfo(name, size));
-        trackedSegmentSizes.add(size);
+        trackedSegmentSizes.put(name, size);
     }
     // ...
     // When adding new segment:
-    trackedSegmentSizes.add(newSegmentSize);
+    trackedSegmentSizes.put(newSegmentName, newSegmentSize);
     
     // In applyMerges:
     for (SegmentCommitInfo info : merge.segments) {
         newSize += info.sizeInBytes();
         infos.remove(info);
-        trackedSizes.remove(Long.valueOf(info.sizeInBytes()));
+        trackedSizes.remove(info.info.name);
     }
-    trackedSizes.add(newSize);
+    trackedSizes.put(mergedName, newSize);
     
     // When calculating metrics:
-    SegmentTopologyMetrics metrics = calculateTopologyMetrics(trackedSegmentSizes);
+    SegmentTopologyMetrics metrics = calculateTopologyMetrics(new ArrayList<>(trackedSegmentSizes.values()));

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a38b165 and e295923.

📒 Files selected for processing (2)

server/src/main/java/org/opensearch/index/AdaptiveTieredMergePolicyProvider.java
server/src/test/java/org/opensearch/index/SegmentTopologyBenchmarkTests.java

🧰 Additional context used

🧬 Code graph analysis (1)

server/src/test/java/org/opensearch/index/SegmentTopologyBenchmarkTests.java (1)

server/src/test/java/org/opensearch/index/analysis/SegmentTopologyTestUtils.java (1)

SegmentTopologyTestUtils (19-53)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)

GitHub Check: gradle-check
GitHub Check: detect-breaking-change
GitHub Check: assemble (25, ubuntu-24.04-arm)
GitHub Check: assemble (21, ubuntu-24.04-arm)
GitHub Check: assemble (21, ubuntu-latest)
GitHub Check: assemble (25, windows-latest)
GitHub Check: assemble (25, ubuntu-latest)
GitHub Check: assemble (21, windows-latest)
GitHub Check: Analyze (java)
GitHub Check: precommit (25, macos-15-intel)
GitHub Check: precommit (21, macos-15)
GitHub Check: precommit (25, ubuntu-latest)
GitHub Check: precommit (21, macos-15-intel)
GitHub Check: precommit (25, ubuntu-24.04-arm)
GitHub Check: precommit (21, ubuntu-latest)
GitHub Check: precommit (21, windows-2025, true)
GitHub Check: precommit (25, macos-15)
GitHub Check: precommit (21, windows-latest)
GitHub Check: precommit (21, ubuntu-24.04-arm)
GitHub Check: precommit (25, windows-latest)

🔇 Additional comments (17)

server/src/main/java/org/opensearch/index/AdaptiveTieredMergePolicyProvider.java (3)

34-39: LGTM!

Field declarations are correct. All fields are properly marked final, and the settingsLock provides thread-safety for concurrent merge policy modifications as intended.

41-57: LGTM!

Constructor correctly initializes the adaptive policy with thread-safe lock, reads the merge-enabled setting, logs appropriate warnings, and applies defaults.

79-82: LGTM!

The getMergePolicy() method correctly returns the adaptive policy when merges are enabled, or NoMergePolicy otherwise. The conditional logic is clear and the use of fully qualified names improves readability.

server/src/test/java/org/opensearch/index/SegmentTopologyBenchmarkTests.java (14)

1-44: LGTM! Clean test class structure.

The imports, class declaration, and javadoc are well-organized. The past issue regarding the @opensearch.api tag has been properly addressed.

50-56: LGTM! Good coverage of shard size scenarios.

The test appropriately validates recommendations across a representative range of shard sizes from 50MB to 50GB.

61-69: LGTM! Clear validation of topology quality metrics.

The test effectively demonstrates that optimal topologies produce lower variance and better balance scores compared to pathological cases.

126-136: LGTM! Thorough boundary condition testing.

The test properly validates all category boundaries, ensuring the categorization logic works correctly at the threshold values.

141-161: LGTM! Comprehensive regression protection.

The test validates all merge policy parameters with exact expected values, providing strong regression protection. The previous issue regarding max segment validation has been properly addressed.

163-179: LGTM! Solid sanity checks for recommendations.

The validation method appropriately verifies that recommendations are positive, within expected bounds, and maintain proper relationships (e.g., floor < max).

181-198: LGTM! Effective topology simulation for benchmarking.

The bad and optimal topology scenarios clearly demonstrate the impact of segment distribution on variance and balance metrics, providing good test coverage.

264-291: LGTM! Proper mock setup for segment simulation.

The method correctly creates mock segments with realistic file structure and size reporting, suitable for testing merge policy behavior.

293-303: LGTM! Clear categorization logic.

The shard size categorization is straightforward and aligns with the boundaries documented in the javadoc (lines 121-124).

305-336: LGTM! Comprehensive validation with regression protection.

The method properly validates all merge policy parameters with both sanity checks and exact value matching. The previous issue regarding missing max segment validation has been resolved.

338-352: LGTM! Numerically sound variance calculation.

The method correctly calculates variance using double precision to avoid overflow, addressing the previous review concern.

354-362: LGTM! Intuitive balance score calculation.

The balance score formula (1 - range/max) provides an intuitive metric where 1.0 represents perfect balance and lower values indicate increasing imbalance.

364-395: LGTM! Clean inner class definitions.

The inner classes are well-structured data holders appropriate for test metrics. The SegmentTopologyMetrics class correctly uses double for variance as addressed in previous reviews.

83-116: The adaptive policy is properly configured through segment topology at merge time.

The AdaptiveOpenSearchTieredMergePolicy is designed to dynamically compute shard size from segment metadata in the findMerges() method rather than requiring explicit configuration at construction. The test properly exercises this behavior: simulateMergePerformance() creates realistic SegmentInfos and invokes findMerges(), which triggers the adaptive logic to recalculate merge settings based on the current segment topology. No additional shard size configuration is needed or expected.

Likely an incorrect or invalid review comment.

github-actions · 2026-01-01T15:52:12Z

❌ Gradle check result for e295923: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Sriram Ganesh <srignsh22@gmail.com>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

server/src/test/java/org/opensearch/index/SegmentTopologyBenchmarkTests.java (2)
221-227: Minor typo in comment.

Line 222: "Flashing" should be "Flushing".
🔎 Proposed fix
         for (int i = 0; i < iterations; i++) {
-            // 1. Simulate Flashing a new segment
+            // 1. Simulate Flushing a new segment
             long newSegmentSize = docsPerFlush * baseDocSize + randomIntBetween(0, 1024 * 1024); // Add some noise
340-354: Consider adding empty list guard for consistency.

calculateBalanceScore handles the empty list case (line 357), but calculateTopologyMetrics would throw ArithmeticException on line 342 if passed an empty list. While current callers always pass non-empty lists, adding a guard would be more defensive.
🔎 Proposed fix
     private SegmentTopologyMetrics calculateTopologyMetrics(List<Long> segmentSizes) {
+        if (segmentSizes.isEmpty()) {
+            return new SegmentTopologyMetrics(0.0, 0.0);
+        }
         long totalSize = segmentSizes.stream().mapToLong(Long::longValue).sum();
         long meanSize = totalSize / segmentSizes.size();

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e295923 and e40dce4.

📒 Files selected for processing (2)

server/src/main/java/org/opensearch/index/AdaptiveTieredMergePolicyProvider.java
server/src/test/java/org/opensearch/index/SegmentTopologyBenchmarkTests.java

🧰 Additional context used

🧬 Code graph analysis (1)

server/src/test/java/org/opensearch/index/SegmentTopologyBenchmarkTests.java (1)

server/src/test/java/org/opensearch/index/analysis/SegmentTopologyTestUtils.java (1)

SegmentTopologyTestUtils (19-53)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)

GitHub Check: gradle-check
GitHub Check: detect-breaking-change
GitHub Check: assemble (21, windows-latest)
GitHub Check: assemble (21, ubuntu-latest)
GitHub Check: Analyze (java)
GitHub Check: assemble (25, ubuntu-latest)
GitHub Check: assemble (21, ubuntu-24.04-arm)
GitHub Check: assemble (25, windows-latest)
GitHub Check: assemble (25, ubuntu-24.04-arm)
GitHub Check: precommit (25, ubuntu-latest)
GitHub Check: precommit (21, macos-15)
GitHub Check: precommit (21, ubuntu-24.04-arm)
GitHub Check: precommit (21, ubuntu-latest)
GitHub Check: precommit (21, macos-15-intel)
GitHub Check: precommit (25, macos-15)
GitHub Check: precommit (25, windows-latest)
GitHub Check: precommit (21, windows-latest)
GitHub Check: precommit (25, macos-15-intel)
GitHub Check: precommit (25, ubuntu-24.04-arm)
GitHub Check: precommit (21, windows-2025, true)

🔇 Additional comments (3)

server/src/main/java/org/opensearch/index/AdaptiveTieredMergePolicyProvider.java (1)

1-88: LGTM! Well-structured adaptive merge policy provider.

The implementation correctly addresses the concerns from previous reviews:

Uses INDEX_MERGE_ENABLED constant instead of hardcoded string

Removed dead code (categorization methods, unused settings update method)

Proper thread-safety with ReentrantReadWriteLock passed to the underlying policy

Default values align with TieredMergePolicyProvider (5GB max segment, 16MB floor, etc.)

The lock pattern in applyDefaultSettings() with try-finally is correct, and returning the policy reference from getMergePolicy() without locking is safe since both mergesEnabled and tieredMergePolicy are final fields set once in the constructor.
server/src/test/java/org/opensearch/index/SegmentTopologyBenchmarkTests.java (2)
46-163: Well-structured benchmark tests with good regression coverage.

The test file properly addresses the previous review feedback:

Removed the @opensearch.api tag from the test class

Variance calculation now uses double to prevent overflow (lines 344-349)

Added expectedMaxSegment parameter to validateMergeSettings for complete regression protection (lines 307-338)

The tests provide good coverage for validating the adaptive merge policy across different shard sizes and comparing performance characteristics between default and adaptive policies.

83-85: Remove the unnecessary @SuppressForbidden annotation.

The test method does not call any forbidden APIs. The @Seed annotation is from the randomizedtesting library and is not subject to forbidden-apis restrictions. The @SuppressForbidden annotation serves no purpose here and should be removed.
    @Seed("DEADBEEF")
    public void testPerformanceComparison() throws IOException {
Likely an incorrect or invalid review comment.

github-actions · 2026-01-01T18:12:34Z

❌ Gradle check result for e40dce4: TIMEOUT

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Sriram Ganesh <srignsh22@gmail.com>

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e40dce4 and da9f024.

📒 Files selected for processing (1)

server/src/test/java/org/opensearch/index/SegmentTopologyBenchmarkTests.java

🧰 Additional context used

🧬 Code graph analysis (1)

server/src/test/java/org/opensearch/index/SegmentTopologyBenchmarkTests.java (1)

server/src/test/java/org/opensearch/index/analysis/SegmentTopologyTestUtils.java (1)

SegmentTopologyTestUtils (19-53)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)

GitHub Check: gradle-check
GitHub Check: assemble (21, windows-latest)
GitHub Check: assemble (25, ubuntu-latest)
GitHub Check: assemble (25, ubuntu-24.04-arm)
GitHub Check: assemble (25, windows-latest)
GitHub Check: assemble (21, ubuntu-latest)
GitHub Check: assemble (21, ubuntu-24.04-arm)
GitHub Check: precommit (25, windows-latest)
GitHub Check: precommit (25, macos-15)
GitHub Check: precommit (21, macos-15)
GitHub Check: precommit (21, macos-15-intel)
GitHub Check: precommit (21, windows-2025, true)
GitHub Check: precommit (25, ubuntu-latest)
GitHub Check: precommit (21, ubuntu-24.04-arm)
GitHub Check: precommit (25, ubuntu-24.04-arm)
GitHub Check: Analyze (java)
GitHub Check: precommit (21, windows-latest)
GitHub Check: precommit (21, ubuntu-latest)
GitHub Check: precommit (25, macos-15-intel)
GitHub Check: detect-breaking-change

🔇 Additional comments (6)

server/src/test/java/org/opensearch/index/SegmentTopologyBenchmarkTests.java (6)

1-46: LGTM!

License header, imports, and class declaration are well-structured. The @opensearch.api tag removal from the previous review has been addressed.

52-71: LGTM!

Test coverage for recommendations validation and topology variance impact is well-structured with clear assertions.

131-166: LGTM!

The categorization boundary tests and merge settings validation provide solid regression protection. The previous review feedback about adding expectedMaxSegment has been addressed with exact values now being validated for all three parameters.

269-296: LGTM!

The createSegmentCommitInfo helper correctly sets up the mock Directory to handle the .cfe file extension, and setFiles() is consistent with this mock configuration. The hardcoded zero values for delete counts are appropriate for this simulation context.

343-370: LGTM!

The variance overflow issue from the previous review has been addressed by using double arithmetic. The balance score calculation and edge case handling are correct.

372-403: LGTM!

The inner data classes are well-structured for test-only use. The SegmentTopologyMetrics class now correctly uses double for variance, addressing the previous review feedback.

coderabbitai · 2026-01-02T07:34:49Z

server/src/test/java/org/opensearch/index/SegmentTopologyBenchmarkTests.java

+    @Seed("DEADBEEF")
+    @SuppressForbidden(reason = "Benchmark requires deterministic seed")
+    public void testPerformanceComparison() throws IOException {


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Read the complete test method to check for forbidden APIs cat -n server/src/test/java/org/opensearch/index/SegmentTopologyBenchmarkTests.java | head -150

Repository: opensearch-project/OpenSearch

Length of output: 8329

🏁 Script executed:

#!/bin/bash # Search for forbidden-apis configuration in OpenSearch fd -t f "forbidden" -E ".git" -E "target" | head -20

Repository: opensearch-project/OpenSearch

Length of output: 879

🏁 Script executed:

#!/bin/bash # Find examples of @SuppressForbidden usage to understand what triggers it rg "@SuppressForbidden" --type java -B2 -A5 | head -100

Repository: opensearch-project/OpenSearch

Length of output: 12759

🏁 Script executed:

#!/bin/bash # Check if this class uses any System.* or other commonly forbidden APIs rg "System\.(getenv|exit|setSecurityManager|setOut|setErr|setIn)" server/src/test/java/org/opensearch/index/SegmentTopologyBenchmarkTests.java

Repository: opensearch-project/OpenSearch

Length of output: 55

Remove @SuppressForbidden annotation—no forbidden APIs are used.

The method contains no forbidden API calls (no System.*, Thread.suspend/resume, HttpServer, etc.). The @SuppressForbidden annotation is designed to suppress legitimate forbidden API usage; applying it with a reason about deterministic seeds is incorrect. The @Seed annotation from randomizedtesting is not a forbidden API and requires no suppression.

🤖 Prompt for AI Agents

In server/src/test/java/org/opensearch/index/SegmentTopologyBenchmarkTests.java around lines 83 to 85, remove the unnecessary @SuppressForbidden annotation on the testPerformanceComparison method; the method does not use any forbidden APIs and the @Seed annotation is sufficient for deterministic seeding. Delete the @SuppressForbidden line and its reason text so only @Seed("DEADBEEF") remains above the method signature.

server/src/test/java/org/opensearch/index/SegmentTopologyBenchmarkTests.java

github-actions · 2026-01-02T09:32:07Z

❌ Gradle check result for da9f024: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Sriram Ganesh <srignsh22@gmail.com>

github-actions · 2026-01-02T11:26:35Z

✅ Gradle check result for b1d79c9: SUCCESS

rgsriram · 2026-01-12T06:35:35Z

@msfroh - Could you please review?.

atris · 2026-01-12T10:47:45Z

Not a fan of this approach.

Hardcoding 5GB/10GB thresholds in the merge policy feels like a heuristic that won't age well, especially with mixed workloads (high deletes, small hot indices). It introduces "sloshing" behavior where a shard can flip-flop configs just by crossing an arbitrary byte boundary.

Also, MergePolicy is supposed to be a stateless factory. Mutating the config inside findMerges is an anti-pattern; it makes debugging concurrency issues a nightmare.

If we need adaptive merging, it belongs in ISM where we see the whole cluster, not hacked into the low-level Lucene merge logic. I'm -1 on merging this.

atris · 2026-01-12T10:58:14Z

server/src/main/java/org/opensearch/index/AdaptiveMergePolicyCalculator.java

+        double logSize = Math.log10(Math.max(1L, shardSizeBytes));
+        // Thresholds are based on decimal powers: log10 < 8.0 ≈ < 100MB, < 9.0 ≈ < 1GB, etc.
+        // Returned sizes use binary units (MiB/GiB via 1024²)
+        if (logSize < 8.0) { // < 10^8 bytes (≈ 95.4 MiB)


Hardcoding static byte-size thresholds (50MB, 1GB, 5GB) is brittle. A 5GB index on NVMe behaves differently than on spinning disk; using a fixed "one size fits all" curve here ignores hardware capabilities and I/O pressure.

atris · 2026-01-12T11:02:44Z

server/src/main/java/org/opensearch/index/AdaptiveMergePolicyCalculator.java

+        // Returned sizes use binary units (MiB/GiB via 1024²)
+        if (logSize < 8.0) { // < 10^8 bytes (≈ 95.4 MiB)
+            return 50L * 1024 * 1024; // 50 MiB
+        } else if (logSize < 9.0) { // 10^8 - 10^9 bytes (≈ 95.4 MiB - 953.7 MiB)


Please link the specific OSB results that empirically validate 5GB as the optimal inflection point for this curve. Without data showing a statistically significant regression at 4.9GB vs 5.1GB, these magic numbers are risky.

atris · 2026-01-12T11:06:14Z

server/src/main/java/org/opensearch/index/AdaptiveOpenSearchTieredMergePolicy.java

+
+        // Synchronize settings updates to prevent race conditions with concurrent
+        // setter calls
+        settingsLock.writeLock().lock();


findMerges runs under IndexWriter's concurrency model. Introducing this lock asks for deadlocks if
IndexWriter holds its own monitor. Have you verified the lock ordering here?

atris · 2026-01-12T11:08:58Z

server/src/main/java/org/opensearch/index/AdaptiveOpenSearchTieredMergePolicy.java

+        // setter calls
+        settingsLock.writeLock().lock();
+        try {
+            regularMergePolicy.setMaxMergedSegmentMB(maxMergedSegmentMB);


IndexSettings instantiates this policy once per index (via the Provider). This means all shards share this single mutable instance. A large shard will flip these settings for a small neighbor concurrently, causing non-deterministic behavior.

atris · 2026-01-12T11:09:27Z

server/src/main/java/org/opensearch/index/merge/SegmentTopologyAnalyzer.java

+        for (Segment segment : segments) {
+            sizes.add(segment.sizeBytes);
+        }
+        Collections.sort(sizes);


Allocating and sorting this on the heap per-request is a clear DoS vector on nodes with 50k+ segments. We need a streaming collector here, not a full sort.

atris · 2026-01-12T11:25:42Z

The PR description refers to adding a REST API for real time monitoring but the PR doesnt seem to have the same. Is that a feature planned for the future?

rgsriram · 2026-01-18T12:14:41Z

Thanks @atris, for the detailed review. Closing this PR to rethink the architecture and explore an ISM-based implementation as suggested.

rgsriram requested review from a team, Bukhtawar, CEHENKLE, Rishikesh1159, anasalkouz, andrross, ashking94, cwperks, dbwiddis, gbbafna, jed326, kotwanikunal, mch2, msfroh, owaiskazi19, reta, sachinpkale, saratvemulapalli, shwetathareja and sohami as code owners September 21, 2025 17:47

github-actions bot added enhancement Enhancement or improvement to existing feature or request Search:Performance labels Sep 21, 2025

opensearch-ci-bot mentioned this pull request Sep 21, 2025

[AUTOCUT] Gradle Check Flaky Test Report for IngestFromKinesisIT #17678

Open

rgsriram force-pushed the issue-11163 branch from 2b398ff to e88f69b Compare September 22, 2025 04:45

rgsriram force-pushed the issue-11163 branch from abe1e49 to 5076ee5 Compare September 22, 2025 11:11

opensearch-ci-bot mentioned this pull request Sep 22, 2025

[AUTOCUT] Gradle Check Flaky Test Report for RemoteClusterStateServiceTests #19325

Closed

rgsriram force-pushed the issue-11163 branch from 5076ee5 to 6f0df5f Compare September 28, 2025 13:05

coderabbitai bot reviewed Jan 1, 2026

View reviewed changes

Addressed the nitpick from code rabbit

e40dce4

Signed-off-by: Sriram Ganesh <srignsh22@gmail.com>

coderabbitai bot reviewed Jan 1, 2026

View reviewed changes

Addressed the nitpick from code rabbit.

da9f024

Signed-off-by: Sriram Ganesh <srignsh22@gmail.com>

coderabbitai bot reviewed Jan 2, 2026

View reviewed changes

Addressed the nitpick from code rabbit.

b1d79c9

Signed-off-by: Sriram Ganesh <srignsh22@gmail.com>

atris requested changes Jan 12, 2026

View reviewed changes

rgsriram closed this Jan 18, 2026

This was referenced Mar 9, 2026

[AUTOCUT] Gradle Check Flaky Test Report for Gradle Test Run :server:test #19729

Closed

[AUTOCUT] Gradle Check Flaky Test Report for SegmentReplicationWithNodeToNodeIndexShardTests #20860

Open

Conversation

rgsriram commented Sep 21, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Check List

Summary by CodeRabbit

Uh oh!

github-actions bot commented Sep 21, 2025

Uh oh!

github-actions bot commented Sep 22, 2025

Uh oh!

github-actions bot commented Sep 22, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 1, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 1, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Jan 2, 2026

Uh oh!

github-actions bot commented Jan 2, 2026

Uh oh!

rgsriram commented Jan 12, 2026

Uh oh!

atris commented Jan 12, 2026

Uh oh!

atris Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

atris Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

atris Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

atris Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

atris Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

atris commented Jan 12, 2026

Uh oh!

rgsriram commented Jan 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rgsriram commented Sep 21, 2025 •

edited by coderabbitai bot

Loading