Skip to content

Limit size of shardDeleteResults#133558

Merged
joshua-adams-1 merged 48 commits intoelastic:mainfrom
joshua-adams-1:limit-shard-blobs-to-delete
Oct 21, 2025
Merged

Limit size of shardDeleteResults#133558
joshua-adams-1 merged 48 commits intoelastic:mainfrom
joshua-adams-1:limit-shard-blobs-to-delete

Conversation

@joshua-adams-1
Copy link
Copy Markdown
Contributor

@joshua-adams-1 joshua-adams-1 commented Aug 26, 2025

Modifies BlobStoreRepository.ShardBlobsToDelete.shardDeleteResults to have a variable size depending on the remaining heap space rather than a hard-coded 2GB size which caused smaller nodes with less heap space to OOMe.

Relates to #131822
Closes #116379

Closes ES-12540

Modifies `BlobStoreRepository.ShardBlobsToDelete.shardDeleteResults` to have
a variable size depending on the remaining heap space rather than a
hard-coded 2GB size which caused smaller nodes with less heap
space to OOMe.

Relates to elastic#131822

Closes ES-12540
@joshua-adams-1 joshua-adams-1 marked this pull request as ready for review September 4, 2025 15:32
@joshua-adams-1 joshua-adams-1 requested a review from a team as a code owner September 4, 2025 15:32
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Sep 4, 2025
@joshua-adams-1 joshua-adams-1 self-assigned this Sep 4, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Coordination (obsolete) Meta label for Distributed Coordination team. Obsolete. Please do not use. label Sep 4, 2025
// We only want to read this shard delete result if we were able to write the entire object.
// Otherwise, for partial writes, an EOFException will be thrown upon reading
if (this.truncatedShardDeleteResultsOutputStream.hasCapacity()) {
successfullyWrittenBlobsCount += 1;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This replaces resultCount but it's the count of the number of successfully recorded shards not blobs.

if (this.truncatedShardDeleteResultsOutputStream.hasCapacity()) {
successfullyWrittenBlobsCount += 1;
} else {
leakedBlobsCount += 1;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise this is recording the number of shards with leaked blobs rather than the number of leaked blobs. However, rather than just renaming the variable I think we should actually count the number of leaked blobs (i.e. += blobsToDelete.size() here).

Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff, I left only tiny nits about the production code and a few other comments about the testing.

ClusterSettings clusterSettings = clusterService.getClusterSettings();
clusterSettings.initializeAndWatch(
MAX_HEAP_SIZE_FOR_SNAPSHOT_DELETION_SETTING,
status -> this.maxHeapSizeForSnapshotDeletion = status
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe apply the limit here on write (making the field an int) rather than each read?

Suggested change
status -> this.maxHeapSizeForSnapshotDeletion = status
maxHeapSizeForSnapshotDeletion -> this.maxHeapSizeForSnapshotDeletion = Math.toIntExact(
Math.min(maxHeapSizeForSnapshotDeletion.getBytes(), Integer.MAX_VALUE - ByteSizeUnit.MB.toBytes(1))
)

Comment on lines +1747 to +1760
boolean writeTruncated = false;
// There is a minimum of 1 byte available for writing
if (this.truncatedShardDeleteResultsOutputStream.hasCapacity()) {
new ShardSnapshotMetaDeleteResult(Objects.requireNonNull(indexId.getId()), shardId, blobsToDelete).writeTo(compressed);
// We only want to read this shard delete result if we were able to write the entire object.
// Otherwise, for partial writes, an EOFException will be thrown upon reading
if (this.truncatedShardDeleteResultsOutputStream.hasCapacity()) {
resultsCount += 1;
} else {
writeTruncated = true;
}
} else {
writeTruncated = true;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A matter of taste, but consider extracting this section into its own method to clarify that we only get false on the branch that succeeded, all other paths lead to true (and maybe we should invert that so that true means "success").

        private boolean writeBlobsIfCapacity(IndexId indexId, int shardId, Collection<String> blobsToDelete) throws IOException {
            // There is a minimum of 1 byte available for writing
            if (this.truncatedShardDeleteResultsOutputStream.hasCapacity()) {
                new ShardSnapshotMetaDeleteResult(Objects.requireNonNull(indexId.getId()), shardId, blobsToDelete).writeTo(compressed);
                // We only want to read this shard delete result if we were able to write the entire object.
                // Otherwise, for partial writes, an EOFException will be thrown upon reading
                if (this.truncatedShardDeleteResultsOutputStream.hasCapacity()) {
                    resultsCount += 1;
                    return false;
                }
            }

            return true;
        }

Comment on lines +844 to +847
assertEquals(expectedShardGenerations.build(), shardBlobsToDelete.getUpdatedShardGenerations());
shardBlobsToDelete.getBlobPaths().forEachRemaining(s -> assertTrue(expectedBlobsToDelete.remove(s)));
assertThat(expectedBlobsToDelete, empty());
assertThat(shardBlobsToDelete.sizeInBytes(), lessThanOrEqualTo(Math.max(ByteSizeUnit.KB.toIntBytes(1), 20 * blobCount)));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm it's not really within the implied contract of this class to be able to iterate its contents and then try and append more items. We've already closed the underlying compressed stream by this point. I think we should defer these assertions until the end.


// === Second, now capacity is exceeded, test whether subsequent writes are accepted without throwing an error === //

for (int i = 0; i < randomIntBetween(1, 20); i++) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is trappy, it's going to generate a new randomIntBetween on each iteration so you don't get a uniform distribution of values. Better to count down from between(1,20) to zero, or else extract a variable for the upper bound.

// === First, write blobs until capacity is exceeded === //

// While there is at least one byte in the stream, write
while (shardBlobsToDelete.sizeInBytes() < heapMemory) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about iterating until leakedBlobCount reaches some target value here, rather than doing the two separate loops?

Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

final var indexId = new IndexId(randomIdentifier(), randomUUID());
final var shardId = between(1, 30);
final var shardGeneration = new ShardGeneration(randomUUID());
// Always write at least one blob, guaranteeing that the shardDeleteResults stream increases in size
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 well spotted (in fact the stream grows anyway because we write the index ID, but that doesn't necessarily increase leakedBlobCount)

@joshua-adams-1 joshua-adams-1 merged commit 236c9fe into elastic:main Oct 21, 2025
34 checks passed
@joshua-adams-1 joshua-adams-1 deleted the limit-shard-blobs-to-delete branch October 21, 2025 14:33
chrisparrinello pushed a commit to chrisparrinello/elasticsearch that referenced this pull request Oct 24, 2025
Modifies `BlobStoreRepository.ShardBlobsToDelete.shardDeleteResults` to have
a variable size depending on the remaining heap space rather than a
hard-coded 2GB size which caused smaller nodes with less heap
space to OOMe.

Relates to elastic#131822

Closes ES-12540
fzowl pushed a commit to voyage-ai/elasticsearch that referenced this pull request Nov 3, 2025
Modifies `BlobStoreRepository.ShardBlobsToDelete.shardDeleteResults` to have
a variable size depending on the remaining heap space rather than a
hard-coded 2GB size which caused smaller nodes with less heap
space to OOMe.

Relates to elastic#131822

Closes ES-12540
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Jan 8, 2026
In elastic#133558 we imposed a limit on the heap used to keep track of
shard-level blobs to clean up after the commit of a snapshot deletion.
This commit makes use of the same mechanism to track `IndexMetadata`
blobs for future deletion.

Closes elastic#140018
DaveCTurner added a commit that referenced this pull request Jan 12, 2026
In #133558 we imposed a limit on the heap used to keep track of
shard-level blobs to clean up after the commit of a snapshot deletion.
This commit makes use of the same mechanism to track `IndexMetadata`
blobs for future deletion.

Closes #140018
jimczi pushed a commit to jimczi/elasticsearch that referenced this pull request Jan 12, 2026
In elastic#133558 we imposed a limit on the heap used to keep track of
shard-level blobs to clean up after the commit of a snapshot deletion.
This commit makes use of the same mechanism to track `IndexMetadata`
blobs for future deletion.

Closes elastic#140018
@repantis repantis added :Distributed/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. and removed :Distributed Coordination/Distributed labels Jan 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. >non-issue Team:Distributed Coordination (obsolete) Meta label for Distributed Coordination team. Obsolete. Please do not use. v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Snapshot delete tasks do not complete if blobs-to-delete list exceeds 2GiB

4 participants