Implementing batched deletions of stale ClusterMetadataManifests in RemoteClusterStateCleanupManager and adding configurable deletions timeouts by Pranshu-S · Pull Request #20515 · opensearch-project/OpenSearch

Pranshu-S · 2026-01-31T14:33:02Z

Description

Fixes remote cluster state cleanup failures that were causing stale metadata pile-ups in remote storage when deletions time out along with fixes mention in tagged GitHub Issue

Issue context (#20564): RemoteClusterStateCleanupManager runs every ~5 minutes (configurable) and performs sequential deletions (global metadata → index metadata → ephemeral attrs → manifests). A recent change added a 30s timeout to the S3 “sync” delete path; with large delete sets this can throw IOException, abort the whole cleanup run (single try/catch), and leave later phases undeleted—making the next run even larger and more likely to fail.

What this PR does

Batch deletes stale manifest* blobs to reduce per-call delete size and avoid timeout/payload issues.
- Controlled by:
  - cluster.remote_store.state.cleanup.batch_size
  - cluster.remote_store.state.cleanup.max_batches
Makes delete timeout configurable for remote-state cleanup
Fixing issues where the update-interval call was not being honoured when the deletion was disabled (set to -1)
Fixing mis-match in deletions workflow where we were deleting manifests before the the IndexRouting paths which could lead to IndexRouting paths being dangled
Fixing issue where we were moving the lastCleanupAttemptStateVersion even though the cleanup were failing which resulted in next deletes to become no-op if there were no cluster-state changes (or less than 10) and previous deletions failed.
Reducing logging overhead of DEBUG as it was printing every single file to delete. Moved it to TRACE.

Related Issues

Resolves #20564

Check List

Functionality includes testing.
API changes companion pull request created, if applicable.
Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

coderabbitai · 2026-01-31T14:33:13Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

🔍 Trigger a full review

📝 Walkthrough

Walkthrough

This pull request introduces timeout-aware, batched cleanup for remote cluster state files with configurable parameters. It adds TimeValue-based timeout handling across blob container deletion operations, implements batched manifest deletion in RemoteClusterStateCleanupManager, and introduces three new cluster settings for batch size, maximum batches, and cleanup timeout to prevent deletion operation timeouts from cascading stale-file buildup.

Changes

Cohort / File(s)	Summary
Blob Container Timeout Support `plugins/repository-s3/src/main/java/org/opensearch/repositories/s3/S3BlobContainer.java`, `server/src/main/java/org/opensearch/common/blobstore/BlobContainer.java`, `server/src/main/java/org/opensearch/common/blobstore/EncryptedBlobContainer.java`	Added TimeValue-based timeout overloads for deleteBlobsIgnoringIfNotExists. S3BlobContainer replaces hard-coded 30s timeout with configurable TimeValue, updates getFutureValue to handle infinite (MINUS_ONE) vs finite timeouts, and enhanced error messaging. BlobContainer and EncryptedBlobContainer delegate to timeout-aware variants.
Transfer Service Timeout Integration `server/src/main/java/org/opensearch/index/translog/transfer/TransferService.java`, `server/src/main/java/org/opensearch/index/translog/transfer/BlobStoreTransferService.java`	Added timeout-aware overloads for deleteBlobs. TransferService provides default method throwing UnsupportedOperationException; BlobStoreTransferService implements timeout delegation to blob container.
Routing Service Timeout Overloads `server/src/main/java/org/opensearch/cluster/routing/remote/RemoteRoutingTableService.java`, `server/src/main/java/org/opensearch/cluster/routing/remote/InternalRemoteRoutingTableService.java`, `server/src/main/java/org/opensearch/cluster/routing/remote/NoopRemoteRoutingTableService.java`	Added timeout-aware overloads for deleteStaleIndexRoutingPaths and deleteStaleIndexRoutingDiffPaths across the interface hierarchy. InternalRemoteRoutingTableService implements timeout propagation; others provide default/no-op variants.
Batched Cleanup Implementation `server/src/main/java/org/opensearch/gateway/remote/RemoteClusterStateCleanupManager.java`, `server/src/main/java/org/opensearch/gateway/remote/RemoteStateCleanupFailedException.java`, `server/src/main/java/org/opensearch/common/settings/ClusterSettings.java`	Introduced batched manifest cleanup with configurable batch size, max batches, and timeout. Added three new cluster settings (BATCH_SIZE, MAX_BATCHES, TIMEOUT) with defaults. Implemented exception wrapping via RemoteStateCleanupFailedException for better error handling in batch loops.
Unit Tests: Timeout Behavior `plugins/repository-s3/src/test/java/org/opensearch/repositories/s3/S3BlobContainerTimeoutTests.java`, `server/src/test/java/org/opensearch/common/blobstore/BlobContainerTests.java`, `server/src/test/java/org/opensearch/common/blobstore/EncryptedBlobContainerTests.java`, `server/src/test/java/org/opensearch/index/translog/transfer/BlobStoreTransferServiceTests.java`	Added tests verifying timeout parameter handling, delegation, and default method behavior across blob container and transfer service layers.
Integration & Manager Tests `server/src/internalClusterTest/java/org/opensearch/gateway/remote/RemoteClusterStateCleanupManagerIT.java`, `server/src/test/java/org/opensearch/gateway/remote/RemoteClusterStateCleanupManagerTests.java`	Added integration test for multi-batch deletion flow and comprehensive unit tests for batched cleanup behavior, settings updates, timeout propagation, and edge cases (exhausted max batches, single/multiple batches, one-new-manifest scenarios).

Sequence Diagram(s)

sequenceDiagram
    participant CM as ClusterManager
    participant RCC as RemoteClusterStateCleanupManager
    participant BSTS as BlobStoreTransferService
    participant BC as BlobContainer
    participant RRTS as RemoteRoutingTableService
    
    CM->>RCC: scheduleCleanup()
    activate RCC
    RCC->>RCC: listAllManifests()
    RCC->>RCC: identifyStaleFiles()
    RCC->>RCC: initBatchLoop(batchSize=1000, maxBatches=100)
    
    loop for each batch until stale list exhausted or maxBatches reached
        RCC->>RRTS: deleteStaleIndexRoutingPaths(stalePaths, timeout)
        activate RRTS
        RRTS->>BC: deleteBlobsIgnoringIfNotExists(files, timeout)
        activate BC
        BC-->>RRTS: success
        deactivate BC
        RRTS-->>RCC: complete
        deactivate RRTS
        
        RCC->>RRTS: deleteStaleIndexRoutingDiffPaths(diffPaths, timeout)
        activate RRTS
        RRTS->>BC: deleteBlobsIgnoringIfNotExists(files, timeout)
        activate BC
        BC-->>RRTS: success
        deactivate BC
        RRTS-->>RCC: complete
        deactivate RRTS
        
        RCC->>BSTS: deleteBlobs(manifestPaths, timeout)
        activate BSTS
        BSTS->>BC: deleteBlobsIgnoringIfNotExists(manifests, timeout)
        activate BC
        BC-->>BSTS: success
        deactivate BC
        BSTS-->>RCC: complete
        deactivate BSTS
        
        alt Exception in batch
            RCC->>RCC: wrapInRemoteStateCleanupFailedException()
            RCC->>RCC: log error & continue next batch
        end
    end
    deactivate RCC

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

opensearch-project/OpenSearch#20146: Touches the deleteBlobsIgnoringIfNotExists API and adds callers that invoke timeout-aware deletion methods, complementing the timeout parameter additions in this PR.

Suggested labels

>bug, >enhancement, storage, remote-store

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 10.81% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main changes: implementing batched deletions of ClusterMetadataManifests and adding configurable deletion timeouts in RemoteClusterStateCleanupManager.
Description check	✅ Passed	The description addresses the problem context, explains the root cause, details the solutions implemented, and lists related issues and checklist items following the template structure.
Linked Issues check	✅ Passed	The PR successfully addresses all coding requirements from issue `#20564`: batching manifest deletions, making timeouts configurable, reordering deletions to avoid dangling routing paths, preventing state version advancement on failures, and reducing logging overhead.
Out of Scope Changes check	✅ Passed	All code changes are directly aligned with the stated objectives: timeout handling additions, batched deletion implementation, configuration settings, test coverage, and exception handling to prevent cascade failures.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-01-31T14:40:30Z

❌ Gradle check result for 0077d20: FAILURE