Implementing batched deletions of stale ClusterMetadataManifests in RemoteClusterStateCleanupManager and adding configurable deletions timeouts#20515
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the
📝 WalkthroughWalkthroughThis pull request introduces timeout-aware, batched cleanup for remote cluster state files with configurable parameters. It adds TimeValue-based timeout handling across blob container deletion operations, implements batched manifest deletion in RemoteClusterStateCleanupManager, and introduces three new cluster settings for batch size, maximum batches, and cleanup timeout to prevent deletion operation timeouts from cascading stale-file buildup. Changes
Sequence Diagram(s)sequenceDiagram
participant CM as ClusterManager
participant RCC as RemoteClusterStateCleanupManager
participant BSTS as BlobStoreTransferService
participant BC as BlobContainer
participant RRTS as RemoteRoutingTableService
CM->>RCC: scheduleCleanup()
activate RCC
RCC->>RCC: listAllManifests()
RCC->>RCC: identifyStaleFiles()
RCC->>RCC: initBatchLoop(batchSize=1000, maxBatches=100)
loop for each batch until stale list exhausted or maxBatches reached
RCC->>RRTS: deleteStaleIndexRoutingPaths(stalePaths, timeout)
activate RRTS
RRTS->>BC: deleteBlobsIgnoringIfNotExists(files, timeout)
activate BC
BC-->>RRTS: success
deactivate BC
RRTS-->>RCC: complete
deactivate RRTS
RCC->>RRTS: deleteStaleIndexRoutingDiffPaths(diffPaths, timeout)
activate RRTS
RRTS->>BC: deleteBlobsIgnoringIfNotExists(files, timeout)
activate BC
BC-->>RRTS: success
deactivate BC
RRTS-->>RCC: complete
deactivate RRTS
RCC->>BSTS: deleteBlobs(manifestPaths, timeout)
activate BSTS
BSTS->>BC: deleteBlobsIgnoringIfNotExists(manifests, timeout)
activate BC
BC-->>BSTS: success
deactivate BC
BSTS-->>RCC: complete
deactivate BSTS
alt Exception in batch
RCC->>RCC: wrapInRemoteStateCleanupFailedException()
RCC->>RCC: log error & continue next batch
end
end
deactivate RCC
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
❌ Gradle check result for 0077d20: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
0077d20 to
1624616
Compare
|
❌ Gradle check result for 1624616: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
1624616 to
66ef9d4
Compare
|
❌ Gradle check result for 66ef9d4: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
66ef9d4 to
0556b54
Compare
|
❌ Gradle check result for 0556b54: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
0556b54 to
50de664
Compare
|
Appears to be stuck - |
|
❌ Gradle check result for a8cb47a: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 549e3e0: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
Looks like the execution has not started. |
|
❌ Gradle check result for 638eb6f: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for a00b057: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 300eff0: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 79dc692: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 193258b: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
193258b to
5a55b86
Compare
|
❌ Gradle check result for 5a55b86: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
70519d4 to
e60a6bf
Compare
|
❌ Gradle check result for e60a6bf: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
e60a6bf to
21d337e
Compare
|
❌ Gradle check result for 21d337e: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
…emoteClusterStateCleanupManager and adding configurable deletions timeouts Signed-off-by: Pranshu Shukla <pranshushukla06@gmail.com>
…manager-batched-deletions-and-timeouts Signed-off-by: Pranshu Shukla <pranshushukla06@gmail.com>
21d337e to
d371ec4
Compare
|
❌ Gradle check result for d371ec4: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
plugins/repository-s3/src/main/java/org/opensearch/repositories/s3/S3BlobContainer.java (1)
449-468:⚠️ Potential issue | 🟠 MajorPrecision loss:
timeout.seconds()truncates sub-secondTimeValueto 0, causing immediateTimeoutException.
TimeValue.seconds()performs integer division, so a value likeTimeValue.timeValueMillis(500)yields0. Callingfuture.get(0, SECONDS)will time out instantly. Usemillis()withTimeUnit.MILLISECONDSto preserve the full resolution of the providedTimeValue.The error message on line 467 has the same issue — it would report "0 seconds" for sub-second timeouts.
🐛 Proposed fix
try { if (timeout.equals(TimeValue.MINUS_ONE)) { return future.get(); } else { - return future.get(timeout.seconds(), TimeUnit.SECONDS); + return future.get(timeout.millis(), TimeUnit.MILLISECONDS); } } catch (InterruptedException e) { Thread.currentThread().interrupt(); throw new IOException("Future got interrupted", e); } catch (ExecutionException e) { if (e.getCause() instanceof IOException) { throw (IOException) e.getCause(); } throw new RuntimeException(e.getCause()); } catch (TimeoutException e) { FutureUtils.cancel(future); - throw new IOException(String.format(Locale.ROOT, "Delete operation timed out after %s seconds", timeout.seconds()), e); + throw new IOException(String.format(Locale.ROOT, "Delete operation timed out after [%s]", timeout), e); }server/src/test/java/org/opensearch/gateway/remote/RemoteClusterStateCleanupManagerTests.java (1)
773-787:⚠️ Potential issue | 🟡 MinorSpy
cleanUpManageris created but never used.Line 785 calls
remoteClusterStateCleanupManager.cleanUpStaleFiles()(the original) instead ofcleanUpManager.cleanUpStaleFiles()(the spy). The test passes because the skip-path doesn't reachdeleteStaleClusterMetadataregardless, but this is inconsistent withtestRemoteCleanupCallsDeleteIfVersionIncrementGreaterThanThreshold(line 802) which correctly callscleanUpManager.cleanUpStaleFiles().Suggested fix
- remoteClusterStateCleanupManager.cleanUpStaleFiles(); + cleanUpManager.cleanUpStaleFiles();
🤖 Fix all issues with AI agents
In
`@server/src/internalClusterTest/java/org/opensearch/gateway/remote/RemoteClusterStateCleanupManagerIT.java`:
- Around line 340-362: This test omits disabling the periodic cleanup before
teardown which can race with shutdown; after the final assertions (after
computing last10Final) update the cleanup interval to -1 the same way other
tests do so the background task stops — e.g., call the
RemoteClusterStateCleanupManager cleanup interval setting setter (set the
RemoteClusterStateCleanupManager cleanup interval setting to -1) or invoke the
same helper used in
testRemoteCleanupDeleteStale/testRemoteCleanupDeleteStaleIndexRoutingFiles to
disable the cleanup task before the test exits.
In
`@server/src/main/java/org/opensearch/gateway/remote/RemoteClusterStateCleanupManager.java`:
- Around line 525-538: The Runnable submitted via threadpool.executor in
RemoteClusterStateCleanupManager currently throws
RemoteStateCleanupFailedException which becomes an uncaught exception on the
worker thread; instead remove the throw from inside the executor and replace it
with logging plus setting a failure indicator so callers can observe the failure
without crashing the thread. Specifically, in the Runnable (the block that logs
"Exception occurred while deleting Remote Cluster Metadata...") stop rethrowing
RemoteStateCleanupFailedException, keep/emphasize logger.error with the
exception, and set a class-level failure flag or status field (e.g., a boolean
like deleteStaleMetadataFailed or a status enum) so the outer manager can react;
update any code that previously relied on the exception to check that flag (and
re-evaluate that behavior once you make version advancement conditional on
success).
- Around line 217-230: The try/catch around deleteStaleClusterMetadata in
cleanUpStaleFiles is ineffective because deleteStaleClusterMetadata submits work
asynchronously via threadpool.executor(REMOTE_PURGE).execute(...) and returns
immediately, so RemoteStateCleanupFailedException thrown in the Runnable never
reaches this catch and lastCleanupAttemptStateVersion is always advanced; fix by
moving the update of lastCleanupAttemptStateVersion into the asynchronous
success path (i.e., set lastCleanupAttemptStateVersion =
cleanUpAttemptStateVersion only after the cleanup loop inside
deleteStaleClusterMetadata completes without error), and ensure when catching
RemoteStateCleanupFailedException inside the Runnable you log the exception
(include the Throwable) and do NOT advance lastCleanupAttemptStateVersion there;
alternatively make deleteStaleClusterMetadata synchronous if you prefer to keep
the update in cleanUpStaleFiles.
In
`@server/src/test/java/org/opensearch/gateway/remote/RemoteClusterStateCleanupManagerTests.java`:
- Around line 1172-1177: Update the assertion message in
RemoteClusterStateCleanupManagerTests to accurately reflect the checked value:
the test currently asserts that the second invocation contains "manifest-91.dat"
via staleInvocations/staleCaptor but the message says "manifest-90"; change the
message to refer to "manifest-91.dat" (or alternately adjust the predicate to
check for "manifest-90" if that was intended) so the assertion message and the
anyMatch check on staleInvocations.get(0).stream() are consistent.
🧹 Nitpick comments (9)
server/src/main/java/org/opensearch/gateway/remote/RemoteStateCleanupFailedException.java (1)
22-25: RedundanttoString()override.This override simply delegates to
super.toString(), which is the default behavior. It can be removed.♻️ Suggested removal
public class RemoteStateCleanupFailedException extends RuntimeException { public RemoteStateCleanupFailedException(ParameterizedMessage parameterizedMessage, Throwable cause) { super(parameterizedMessage.toString(), cause); } - - `@Override` - public String toString() { - return super.toString(); - } }plugins/repository-s3/src/test/java/org/opensearch/repositories/s3/S3BlobContainerTimeoutTests.java (1)
39-95: Consider adding edge-case tests for timeout behavior.The current tests cover the happy path (successful completion) and verify call delegation, which is good. Consider adding tests for:
TimeValue.MINUS_ONE→ verifiesfuture.get()(infinite wait) path- A future that doesn't complete within the timeout → verifies
TimeoutExceptionhandling and future cancellationThese would exercise the branching logic in
getFutureValuemore thoroughly.server/src/main/java/org/opensearch/common/blobstore/BlobContainer.java (1)
285-295: Javadoc is missing@param timeoutdocumentation.The new
timeoutparameter is not documented in the Javadoc. Also, consider whetherwarnlevel is appropriate here — if a concrete implementation never overrides this method, every cleanup call will emit a warning.debugorinfomight be less noisy for a fallback path that's expected for mostBlobContainerimplementations.Proposed Javadoc and log-level fix
/** * Deletes the blobs with given names. This method will not throw an exception * when one or multiple of the given blobs don't exist and simply ignore this case. * * `@param` blobNames The names of the blob to delete. + * `@param` timeout The timeout for the delete operation. Implementations that do not + * support timeouts will ignore this parameter and delegate to + * {`@link` `#deleteBlobsIgnoringIfNotExists`(List)}. * `@throws` IOException if a subset of blob exists but could not be deleted. */ default void deleteBlobsIgnoringIfNotExists(List<String> blobNames, TimeValue timeout) throws IOException { - LogManager.getLogger(this.getClass()).warn("No implementation exists for using timeout, skipping timeout input"); + LogManager.getLogger(this.getClass()).debug("No implementation exists for using timeout, skipping timeout input"); deleteBlobsIgnoringIfNotExists(blobNames); }server/src/internalClusterTest/java/org/opensearch/gateway/remote/RemoteClusterStateCleanupManagerIT.java (1)
324-331: Prefer static import over inline fully-qualified class reference.
java.util.stream.Collectorsis used inline on lines 330 and 347 asjava.util.stream.Collectors.toList(). Consider adding a static import for consistency with the rest of the file's import style.Add import at top of file
+import java.util.stream.Collectors;Then replace
java.util.stream.Collectors.toList()withCollectors.toList()on lines 330 and 347.server/src/main/java/org/opensearch/cluster/routing/remote/RemoteRoutingTableService.java (1)
80-86: Consider delegating to the non-timeout variant instead of throwing.The
BlobContainerinterface handles the same pattern by falling back to the non-timeout overload in its default method. Here, the default throwsUnsupportedOperationException, which means anyRemoteRoutingTableServiceimplementation that doesn't override these will fail at runtime rather than degrade gracefully.Since
InternalRemoteRoutingTableServicedoes override these, this isn't an immediate issue, but a consistent fallback pattern would be more defensive.Proposed fallback to non-timeout variant
default void deleteStaleIndexRoutingPaths(List<String> stalePaths, TimeValue timeout) throws IOException { - throw new UnsupportedOperationException("deleteStaleIndexRoutingPaths with timeouts is not supported"); + deleteStaleIndexRoutingPaths(stalePaths); } default void deleteStaleIndexRoutingDiffPaths(List<String> stalePaths, TimeValue timeout) throws IOException { - throw new UnsupportedOperationException("deleteStaleIndexRoutingDiffPaths with timeouts is not supported"); + deleteStaleIndexRoutingDiffPaths(stalePaths); }server/src/main/java/org/opensearch/cluster/routing/remote/InternalRemoteRoutingTableService.java (1)
271-311: Consider reducing duplication between timeout and non-timeout variants.The four methods (lines 271–311) share near-identical structure — the only difference is the extra
TimeValueparameter. You could have the non-timeout methods delegate to the timeout variants with anulltimeout, reducing the duplicated try/catch/log/rethrow blocks.Example: delegate non-timeout to timeout variant
`@Override` public void deleteStaleIndexRoutingPaths(List<String> stalePaths) throws IOException { - try { - logger.debug(() -> "Deleting stale index routing files from remote - " + stalePaths); - blobStoreRepository.blobStore().blobContainer(BlobPath.cleanPath()).deleteBlobsIgnoringIfNotExists(stalePaths); - } catch (IOException e) { - logger.error(() -> new ParameterizedMessage("Failed to delete some stale index routing paths from {}", stalePaths), e); - throw e; - } + deleteStaleIndexRoutingPaths(stalePaths, null); } `@Override` public void deleteStaleIndexRoutingPaths(List<String> stalePaths, TimeValue timeValue) throws IOException { try { logger.debug(() -> "Deleting stale index routing files from remote - " + stalePaths); - blobStoreRepository.blobStore().blobContainer(BlobPath.cleanPath()).deleteBlobsIgnoringIfNotExists(stalePaths, timeValue); + if (timeValue != null) { + blobStoreRepository.blobStore().blobContainer(BlobPath.cleanPath()).deleteBlobsIgnoringIfNotExists(stalePaths, timeValue); + } else { + blobStoreRepository.blobStore().blobContainer(BlobPath.cleanPath()).deleteBlobsIgnoringIfNotExists(stalePaths); + } } catch (IOException e) { logger.error(() -> new ParameterizedMessage("Failed to delete some stale index routing paths from {}", stalePaths), e); throw e; } }Same pattern for the diff-path methods.
server/src/test/java/org/opensearch/gateway/remote/RemoteClusterStateCleanupManagerTests.java (1)
908-1013: Good batch-processing test coverage.The four batched-deletion tests (
testBatchedDeletionWithSingleBatch,testBatchedDeletionWithMultipleBatches,testBatchedDeletionStopsWhenManifestsLessThanRetained,testBatchedDeletionExhaustsMaxBatches) comprehensively cover the key scenarios. However, there is no test for the failure path wheredeleteClusterMetadatathrows mid-batch andRemoteStateCleanupFailedExceptionis raised. Adding such a test would help verify batch-level error handling and, importantly, would surface thelastCleanupAttemptStateVersion-advancement issue described in the production code review.server/src/main/java/org/opensearch/gateway/remote/RemoteClusterStateCleanupManager.java (2)
179-201: Rescheduling the async deletion task on non-interval setting changes is unnecessary.
rescheduleAsyncDeletionTask()is called whencleanupBatchSize,cleanupMaxBatches, orcleanupTimeoutare updated. These don't affect the task's scheduling interval — the next run will simply pick up the new field values. OnlyupdateCleanupIntervalgenuinely needs to reschedule. TherescheduleIfNecessary()call is harmless but adds noise.
111-113: Considervolatilefor fields written on the cluster-applier thread and read onREMOTE_PURGE.
cleanupBatchSize,cleanupMaxBatches, andcleanupTimeoutare written by settings-update consumers (cluster-applier thread) and read by the batch-processing loop (REMOTE_PURGEthread). Withoutvolatile, updated values may not be visible to the reader thread due to the Java Memory Model. This is consistent with the existing non-volatilestaleFileCleanupInterval, but the new fields are read in a tight loop during batch processing, increasing the window for stale reads.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
plugins/repository-s3/src/main/java/org/opensearch/repositories/s3/S3BlobContainer.javaplugins/repository-s3/src/test/java/org/opensearch/repositories/s3/S3BlobContainerTimeoutTests.javaserver/src/internalClusterTest/java/org/opensearch/gateway/remote/RemoteClusterStateCleanupManagerIT.javaserver/src/main/java/org/opensearch/cluster/routing/remote/InternalRemoteRoutingTableService.javaserver/src/main/java/org/opensearch/cluster/routing/remote/NoopRemoteRoutingTableService.javaserver/src/main/java/org/opensearch/cluster/routing/remote/RemoteRoutingTableService.javaserver/src/main/java/org/opensearch/common/blobstore/BlobContainer.javaserver/src/main/java/org/opensearch/common/blobstore/EncryptedBlobContainer.javaserver/src/main/java/org/opensearch/common/settings/ClusterSettings.javaserver/src/main/java/org/opensearch/gateway/remote/RemoteClusterStateCleanupManager.javaserver/src/main/java/org/opensearch/gateway/remote/RemoteStateCleanupFailedException.javaserver/src/main/java/org/opensearch/index/translog/transfer/BlobStoreTransferService.javaserver/src/main/java/org/opensearch/index/translog/transfer/TransferService.javaserver/src/test/java/org/opensearch/common/blobstore/BlobContainerTests.javaserver/src/test/java/org/opensearch/common/blobstore/EncryptedBlobContainerTests.javaserver/src/test/java/org/opensearch/gateway/remote/RemoteClusterStateCleanupManagerTests.javaserver/src/test/java/org/opensearch/index/translog/transfer/BlobStoreTransferServiceTests.java
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-01-13T17:40:27.167Z
Learnt from: reta
Repo: opensearch-project/OpenSearch PR: 20411
File: server/src/main/java/org/opensearch/index/codec/CodecService.java:112-133
Timestamp: 2026-01-13T17:40:27.167Z
Learning: Avoid capturing or evaluating a supplier (e.g., this::defaultCodec) upfront when passing it to a registry during object construction. If registries may replace defaults during iteration (as in EnginePlugin.getAdditionalCodecs), pass the supplier itself and only resolve it at use time. This ensures dynamic behavior is preserved during initialization and prevents premature binding of defaults in codecs/registry setup. This pattern should apply to similar initialization paths in Java server code where registries may mutate defaults during construction.
Applied to files:
server/src/main/java/org/opensearch/index/translog/transfer/BlobStoreTransferService.javaserver/src/main/java/org/opensearch/gateway/remote/RemoteStateCleanupFailedException.javaserver/src/main/java/org/opensearch/common/blobstore/BlobContainer.javaserver/src/main/java/org/opensearch/cluster/routing/remote/NoopRemoteRoutingTableService.javaserver/src/main/java/org/opensearch/common/blobstore/EncryptedBlobContainer.javaserver/src/main/java/org/opensearch/cluster/routing/remote/RemoteRoutingTableService.javaserver/src/main/java/org/opensearch/index/translog/transfer/TransferService.javaserver/src/main/java/org/opensearch/common/settings/ClusterSettings.javaserver/src/main/java/org/opensearch/cluster/routing/remote/InternalRemoteRoutingTableService.javaserver/src/main/java/org/opensearch/gateway/remote/RemoteClusterStateCleanupManager.java
📚 Learning: 2026-01-14T09:31:12.773Z
Learnt from: udabhas
Repo: opensearch-project/OpenSearch PR: 20095
File: server/src/main/java/org/opensearch/repositories/blobstore/BlobStoreRepository.java:4146-4165
Timestamp: 2026-01-14T09:31:12.773Z
Learning: For OpenSearch’s S3 path, index-level CryptoMetadata is passed from BlobStoreRepository to S3BlobContainer, and the final KMS key/context merge with repository settings is performed in SseKmsUtil.configureEncryptionSettings. Do not merge in BlobStoreRepository to avoid double-merging.
Applied to files:
server/src/test/java/org/opensearch/common/blobstore/EncryptedBlobContainerTests.javaserver/src/test/java/org/opensearch/gateway/remote/RemoteClusterStateCleanupManagerTests.javaplugins/repository-s3/src/main/java/org/opensearch/repositories/s3/S3BlobContainer.java
🧬 Code graph analysis (3)
server/src/test/java/org/opensearch/gateway/remote/RemoteClusterStateCleanupManagerTests.java (3)
server/src/main/java/org/opensearch/index/translog/transfer/BlobStoreTransferService.java (1)
BlobStoreTransferService(52-376)server/src/main/java/org/opensearch/gateway/remote/RemoteClusterStateCleanupManager.java (2)
RemoteClusterStateCleanupManager(48-655)AsyncStaleFileDeletion(626-654)server/src/main/java/org/opensearch/threadpool/ThreadPool.java (1)
Names(99-130)
server/src/main/java/org/opensearch/common/settings/ClusterSettings.java (1)
server/src/main/java/org/opensearch/gateway/remote/RemoteClusterStateCleanupManager.java (1)
RemoteClusterStateCleanupManager(48-655)
server/src/main/java/org/opensearch/gateway/remote/RemoteClusterStateCleanupManager.java (1)
server/src/main/java/org/opensearch/threadpool/ThreadPool.java (1)
Names(99-130)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: gradle-check
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: Analyze (java)
- GitHub Check: detect-breaking-change
🔇 Additional comments (17)
server/src/main/java/org/opensearch/common/blobstore/EncryptedBlobContainer.java (1)
183-186: LGTM!Consistent delegation pattern matching the existing non-timeout overload.
server/src/main/java/org/opensearch/index/translog/transfer/TransferService.java (1)
105-107: LGTM!Backward-compatible interface extension with a sensible
UnsupportedOperationExceptiondefault, overridden byBlobStoreTransferService.plugins/repository-s3/src/main/java/org/opensearch/repositories/s3/S3BlobContainer.java (2)
125-125: LGTM!Good change from a raw
longto a typedTimeValueconstant, improving readability and consistency with the rest of the timeout-aware API surface.
441-446: LGTM!Clean timeout-forwarding overload, consistent with the non-timeout variant above.
server/src/main/java/org/opensearch/index/translog/transfer/BlobStoreTransferService.java (1)
288-291: LGTM!Clean override of the
TransferServicedefault, consistently delegating to the timeout-aware blob container method.server/src/main/java/org/opensearch/cluster/routing/remote/NoopRemoteRoutingTableService.java (1)
125-133: LGTM!Noop overloads consistent with the existing patterns in this class.
server/src/test/java/org/opensearch/common/blobstore/EncryptedBlobContainerTests.java (1)
317-328: LGTM!Clean test verifying timeout-aware delegation from
EncryptedBlobContainerto the underlyingBlobContainer.server/src/main/java/org/opensearch/common/settings/ClusterSettings.java (1)
784-786: LGTM!New cleanup settings are properly registered alongside existing remote cluster state settings. The settings definitions (from
RemoteClusterStateCleanupManager) include appropriate validation bounds and are correctly scoped asNodeScope+Dynamic.server/src/test/java/org/opensearch/common/blobstore/BlobContainerTests.java (1)
25-33: LGTM!Good use of
doCallRealMethodto exercise the default interface method and verify it delegates to the non-timeout overload.server/src/test/java/org/opensearch/index/translog/transfer/BlobStoreTransferServiceTests.java (2)
78-96: LGTM on new mock scaffolding.The mock fields and wiring are cleanly added without affecting existing tests (which construct their own
TransferServiceinstances).
319-327: LGTM on timeout delegation test.Cleanly verifies that
deleteBlobs(path, fileNames, timeout)delegates todeleteBlobsIgnoringIfNotExists(fileNames, timeout)on the underlying container.server/src/test/java/org/opensearch/gateway/remote/RemoteClusterStateCleanupManagerTests.java (2)
297-325: LGTM on timeout-aware verifications intestDeleteClusterMetadata.All deletion operations correctly verified with
CLUSTER_STATE_CLEANUP_TIMEOUT_DEFAULT, and thenever()+verifyNoMoreInteractionscombination ensures completeness.
827-906: Good coverage of new settings defaults and dynamic updates.These tests thoroughly validate default values, dynamic updates via
clusterSettings.applySettings, and correct propagation.server/src/main/java/org/opensearch/gateway/remote/RemoteClusterStateCleanupManager.java (4)
52-56: LGTM on new default constants.Clear naming and appropriate default values.
CLUSTER_STATE_CLEANUP_TIMEOUT_DEFAULT = TimeValue.MINUS_ONEas "no timeout" is consistent with OpenSearch conventions.
73-105: LGTM on new settings.Batch size minimum of
RETAINED_MANIFESTS + SKIP_CLEANUP_STATE_CHANGES(= 20) ensures there's always room for retention. Max batches minimum of 1 and timeout minimum ofMINUS_ONEare appropriate.
426-453: Good ordering: manifests deleted last.Deleting stale routing paths and routing diff paths before their manifests avoids leaving dangling routing files if a subsequent batch or run fails. This aligns with the PR objective.
486-543: LGTM on the batch-processing loop structure.The batch loop with configurable
cleanupBatchSizeandcleanupMaxBatches, the guard viadeleteStaleMetadataRunning, and thefinallyblock releasing the lock are well-structured. The early exit whenbatchManifests.size() <= manifestsToRetaincorrectly terminates processing.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| assertBusy(() -> { | ||
| List<String> currentManifests = repository.blobStore() | ||
| .blobContainer(manifestContainerPath) | ||
| .listBlobsByPrefix("manifest") | ||
| .keySet() | ||
| .stream() | ||
| .sorted() | ||
| .collect(java.util.stream.Collectors.toList()); | ||
| assertTrue("Manifests should be cleaned up in batches", currentManifests.size() < initialManifests.size()); | ||
| }); | ||
|
|
||
| List<String> finalManifests = repository.blobStore() | ||
| .blobContainer(manifestContainerPath) | ||
| .listBlobsByPrefix("manifest") | ||
| .keySet() | ||
| .stream() | ||
| .sorted() | ||
| .collect(java.util.stream.Collectors.toList()); | ||
| List<String> last10Final = finalManifests.subList(Math.max(0, finalManifests.size() - 10), finalManifests.size()); | ||
|
|
||
| assertEquals("Last 10 manifest files should remain the same after cleanup", last10Initial, last10Final); | ||
|
|
||
| } |
There was a problem hiding this comment.
Missing cleanup disable before test teardown — may cause flaky shutdown.
Other tests in this class (e.g., testRemoteCleanupDeleteStale at lines 161-168, testRemoteCleanupDeleteStaleIndexRoutingFiles at lines 239-244) disable the cleanup interval (-1) after assertions to avoid race conditions during shutdown. This test omits that step, which could lead to intermittent failures if a cleanup task fires during cluster teardown.
Proposed fix: disable cleanup before test ends
assertEquals("Last 10 manifest files should remain the same after cleanup", last10Initial, last10Final);
+ // disable the clean up to avoid race condition during shutdown
+ response = client().admin()
+ .cluster()
+ .prepareUpdateSettings()
+ .setPersistentSettings(Settings.builder().put(REMOTE_CLUSTER_STATE_CLEANUP_INTERVAL_SETTING.getKey(), "-1"))
+ .get();
+ assertTrue(response.isAcknowledged());
}🤖 Prompt for AI Agents
In
`@server/src/internalClusterTest/java/org/opensearch/gateway/remote/RemoteClusterStateCleanupManagerIT.java`
around lines 340 - 362, This test omits disabling the periodic cleanup before
teardown which can race with shutdown; after the final assertions (after
computing last10Final) update the cleanup interval to -1 the same way other
tests do so the background task stops — e.g., call the
RemoteClusterStateCleanupManager cleanup interval setting setter (set the
RemoteClusterStateCleanupManager cleanup interval setting to -1) or invoke the
same helper used in
testRemoteCleanupDeleteStale/testRemoteCleanupDeleteStaleIndexRoutingFiles to
disable the cleanup task before the test exits.
| try { | ||
| this.deleteStaleClusterMetadata( | ||
| currentAppliedState.getClusterName().value(), | ||
| currentAppliedState.metadata().clusterUUID(), | ||
| RETAINED_MANIFESTS | ||
| ); | ||
| lastCleanupAttemptStateVersion = cleanUpAttemptStateVersion; | ||
| } catch (RemoteStateCleanupFailedException e) { | ||
| logger.error( | ||
| "Failed to clean up stale remote cluster state files for cluster [{}] with uuid [{}]", | ||
| currentAppliedState.getClusterName().value(), | ||
| currentAppliedState.metadata().clusterUUID() | ||
| ); | ||
| } |
There was a problem hiding this comment.
catch (RemoteStateCleanupFailedException) is dead code — lastCleanupAttemptStateVersion is always advanced.
deleteStaleClusterMetadata (line 480) submits the batched cleanup to threadpool.executor(REMOTE_PURGE).execute(…) and returns immediately without throwing. The RemoteStateCleanupFailedException is thrown inside the executor's Runnable (line 531), which is on a different call stack. It never propagates back to this try-catch.
Consequence: lastCleanupAttemptStateVersion = cleanUpAttemptStateVersion at line 223 always executes, regardless of cleanup success or failure. This defeats the stated PR objective of preventing version advancement on failure, and means a failed cleanup still moves the version forward — causing the next cleanUpStaleFiles invocation to skip cleanup if the state-version delta is below the threshold. Stale files will pile up exactly as described in issue #20564.
Additionally, the caught exception e is not included in the logger.error call (line 225), so even if this path were reachable, the stack trace would be lost.
A possible fix is to move the lastCleanupAttemptStateVersion update into the async success path, or to make deleteStaleClusterMetadata synchronous since cleanUpStaleFiles already runs on REMOTE_PURGE. For example:
Sketch: move version update into the async block
- try {
- this.deleteStaleClusterMetadata(
- currentAppliedState.getClusterName().value(),
- currentAppliedState.metadata().clusterUUID(),
- RETAINED_MANIFESTS
- );
- lastCleanupAttemptStateVersion = cleanUpAttemptStateVersion;
- } catch (RemoteStateCleanupFailedException e) {
- logger.error(
- "Failed to clean up stale remote cluster state files for cluster [{}] with uuid [{}]",
- currentAppliedState.getClusterName().value(),
- currentAppliedState.metadata().clusterUUID()
- );
- }
+ this.deleteStaleClusterMetadata(
+ currentAppliedState.getClusterName().value(),
+ currentAppliedState.metadata().clusterUUID(),
+ RETAINED_MANIFESTS,
+ cleanUpAttemptStateVersion
+ );Then inside deleteStaleClusterMetadata, after the batch loop completes successfully:
// inside the try block, after the while loop
lastCleanupAttemptStateVersion = cleanUpAttemptStateVersion;🤖 Prompt for AI Agents
In
`@server/src/main/java/org/opensearch/gateway/remote/RemoteClusterStateCleanupManager.java`
around lines 217 - 230, The try/catch around deleteStaleClusterMetadata in
cleanUpStaleFiles is ineffective because deleteStaleClusterMetadata submits work
asynchronously via threadpool.executor(REMOTE_PURGE).execute(...) and returns
immediately, so RemoteStateCleanupFailedException thrown in the Runnable never
reaches this catch and lastCleanupAttemptStateVersion is always advanced; fix by
moving the update of lastCleanupAttemptStateVersion into the asynchronous
success path (i.e., set lastCleanupAttemptStateVersion =
cleanUpAttemptStateVersion only after the cleanup loop inside
deleteStaleClusterMetadata completes without error), and ensure when catching
RemoteStateCleanupFailedException inside the Runnable you log the exception
(include the Throwable) and do NOT advance lastCleanupAttemptStateVersion there;
alternatively make deleteStaleClusterMetadata synchronous if you prefer to keep
the update in cleanUpStaleFiles.
| } catch (Exception e) { | ||
| logger.error( | ||
| "Exception occurred while deleting Remote Cluster Metadata for clusterUUIDs [{}]. Exception: {}", | ||
| clusterUUID, | ||
| e | ||
| ); | ||
| throw new RemoteStateCleanupFailedException( | ||
| new ParameterizedMessage( | ||
| "Exception occurred while deleting Remote Cluster Metadata for clusterUUIDs [{}]. Exception: {}", | ||
| clusterUUID, | ||
| e | ||
| ), | ||
| e | ||
| ); |
There was a problem hiding this comment.
RemoteStateCleanupFailedException thrown inside executor's Runnable becomes an uncaught exception.
Since this exception is thrown from within threadpool.executor(…).execute(Runnable), it is not caught by any application-level handler. It will propagate to the thread's UncaughtExceptionHandler (which in OpenSearch logs and replaces the thread). The finally block at line 539 correctly resets deleteStaleMetadataRunning, so the lock is released.
However, after the critical bug above is fixed (making the version advancement conditional on success), re-evaluate whether this throw is still needed or whether logging + setting a failure flag is more appropriate.
🤖 Prompt for AI Agents
In
`@server/src/main/java/org/opensearch/gateway/remote/RemoteClusterStateCleanupManager.java`
around lines 525 - 538, The Runnable submitted via threadpool.executor in
RemoteClusterStateCleanupManager currently throws
RemoteStateCleanupFailedException which becomes an uncaught exception on the
worker thread; instead remove the throw from inside the executor and replace it
with logging plus setting a failure indicator so callers can observe the failure
without crashing the thread. Specifically, in the Runnable (the block that logs
"Exception occurred while deleting Remote Cluster Metadata...") stop rethrowing
RemoteStateCleanupFailedException, keep/emphasize logger.error with the
exception, and set a class-level failure flag or status field (e.g., a boolean
like deleteStaleMetadataFailed or a status enum) so the outer manager can react;
update any code that previously relied on the exception to check that flag (and
re-evaluate that behavior once you make version advancement conditional on
success).
| List<List<BlobMetadata>> staleInvocations = staleCaptor.getAllValues(); | ||
| assertTrue( | ||
| "Second invocation should have manifest-90 in stale list", | ||
| staleInvocations.get(0).stream().anyMatch(blob -> "manifest-91.dat".equals(blob.name())) | ||
| ); | ||
| } |
There was a problem hiding this comment.
Assertion message doesn't match the actual check.
The message says "manifest-90" but the predicate checks for "manifest-91.dat". Based on the test setup, the stale manifest in the second batch is indeed manifest-91.dat.
Suggested fix
assertTrue(
- "Second invocation should have manifest-90 in stale list",
+ "Second invocation should have manifest-91 in stale list",
staleInvocations.get(0).stream().anyMatch(blob -> "manifest-91.dat".equals(blob.name()))
);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| List<List<BlobMetadata>> staleInvocations = staleCaptor.getAllValues(); | |
| assertTrue( | |
| "Second invocation should have manifest-90 in stale list", | |
| staleInvocations.get(0).stream().anyMatch(blob -> "manifest-91.dat".equals(blob.name())) | |
| ); | |
| } | |
| List<List<BlobMetadata>> staleInvocations = staleCaptor.getAllValues(); | |
| assertTrue( | |
| "Second invocation should have manifest-91 in stale list", | |
| staleInvocations.get(0).stream().anyMatch(blob -> "manifest-91.dat".equals(blob.name())) | |
| ); | |
| } |
🤖 Prompt for AI Agents
In
`@server/src/test/java/org/opensearch/gateway/remote/RemoteClusterStateCleanupManagerTests.java`
around lines 1172 - 1177, Update the assertion message in
RemoteClusterStateCleanupManagerTests to accurately reflect the checked value:
the test currently asserts that the second invocation contains "manifest-91.dat"
via staleInvocations/staleCaptor but the message says "manifest-90"; change the
message to refer to "manifest-91.dat" (or alternately adjust the predicate to
check for "manifest-90" if that was intended) so the assertion message and the
anyMatch check on staleInvocations.get(0).stream() are consistent.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #20515 +/- ##
============================================
- Coverage 73.33% 73.22% -0.11%
+ Complexity 72125 72110 -15
============================================
Files 5798 5800 +2
Lines 329654 329814 +160
Branches 47491 47522 +31
============================================
- Hits 241741 241521 -220
- Misses 68504 68962 +458
+ Partials 19409 19331 -78 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Pranshu Shukla <pranshushukla06@gmail.com>
Signed-off-by: Pranshu Shukla <pranshushukla06@gmail.com>
|
Closing this PR in virtue of #20566 |
Description
Fixes remote cluster state cleanup failures that were causing stale metadata pile-ups in remote storage when deletions time out along with fixes mention in tagged GitHub Issue
Issue context (#20564):
RemoteClusterStateCleanupManagerruns every ~5 minutes (configurable) and performs sequential deletions (global metadata → index metadata → ephemeral attrs → manifests). A recent change added a 30s timeout to the S3 “sync” delete path; with large delete sets this can throwIOException, abort the whole cleanup run (single try/catch), and leave later phases undeleted—making the next run even larger and more likely to fail.What this PR does
manifest*blobs to reduce per-call delete size and avoid timeout/payload issues.cluster.remote_store.state.cleanup.batch_sizecluster.remote_store.state.cleanup.max_batcheslastCleanupAttemptStateVersioneven though the cleanup were failing which resulted in next deletes to become no-op if there were no cluster-state changes (or less than 10) and previous deletions failed.Related Issues
Resolves #20564
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.