Save Memory on Large Repository Metadata Blob Writes#74313
Save Memory on Large Repository Metadata Blob Writes#74313original-brownbear merged 35 commits intoelastic:masterfrom original-brownbear:save-memory-large-meta-blob-writes
Conversation
| boolean atomic, | ||
| CheckedConsumer<OutputStream, IOException> writer | ||
| ) throws IOException { | ||
| // TODO: this is just a stop-gap solution for until we have an encrypted output stream wrapper |
There was a problem hiding this comment.
Given how work on the encrypted repo has stalled yet again at this point, I don't think it's worth it investing a lot of time here to code up an output stream implementation as well. We can do that (and it shouldn't be too hard) when the time comes.
There was a problem hiding this comment.
Maybe we should create an issue to keep track of this?
| final String indexBlob = INDEX_FILE_PREFIX + Long.toString(newGen); | ||
| logger.debug("Repository [{}] writing new index generational blob [{}]", metadata.name(), indexBlob); | ||
| try (ReleasableBytesStreamOutput out = new ReleasableBytesStreamOutput(bigArrays)) { | ||
| writeAtomic(blobContainer(), indexBlob, out -> { |
There was a problem hiding this comment.
As we do streaming reads of RepositoryData already, adding streaming writes here removes the 2G size limit on it (though the most important part of this is that we save ourselves the trouble of materializing the bytes on heap for large repos where this has put a lot of stress on master's heap often).
Extracted the chunked output stream logic from #74313 and added tests for it to make it easier to review.
...pository-s3/src/test/java/org/elasticsearch/repositories/s3/S3BlobContainerRetriesTests.java
Outdated
Show resolved
Hide resolved
| final String indexBlob = INDEX_FILE_PREFIX + Long.toString(newGen); | ||
| logger.debug("Repository [{}] writing new index generational blob [{}]", metadata.name(), indexBlob); | ||
| try (ReleasableBytesStreamOutput out = new ReleasableBytesStreamOutput(bigArrays)) { | ||
| writeAtomic(blobContainer(), indexBlob, out -> { |
server/src/main/java/org/elasticsearch/repositories/blobstore/ChecksumBlobStoreFormat.java
Outdated
Show resolved
Hide resolved
|
Thanks Francisco and Tanguy! |
This PR adds a new API for doing streaming serialization writes to a repository to enable repository metadata of arbitrary size and at bounded memory during writing. The existing write-APIs require knowledge of the eventual blob size beforehand. This forced us to materialize the serialized blob in memory before writing, costing a lot of memory in case of e.g. very large RepositoryData (and limiting us to 2G max blob size). With this PR the requirement to fully materialize the serialized metadata goes away and the memory overhead becomes completely bounded by the outbound buffer size of the repository implementation. As we move to larger repositories this makes master node stability a lot more predictable since writing out RepositoryData does not take as much memory any longer (same applies to shard level metadata), enables aggregating multiple metadata blobs into a single larger blobs without massive overhead and removes the 2G size limit on RepositoryData. backport of #74313 and #74620
See elastic#53119 for more context about why those tests are muted on JDK8. They start failing more often recently now elastic#74313 and elastic#74620 have been merged, as reported in elastic#74739.
|
@original-brownbear , do you know if this PR should change in any way our statement in the docs saying that |
|
Just for completeness sake here: @eedugon as discussed on another channel, this is waiting for a docs update only. The |
This PR adds a new API for doing streaming serialization writes to a repository to enable repository metadata of arbitrary size and at bounded memory during writing.
The existing write-APIs require knowledge of the eventual blob size beforehand. This forced us to materialize the serialized blob in memory before writing, costing a lot of memory in case of e.g. very large
RepositoryData(and limiting us to2Gmax blob size).With this PR the requirement to fully materialize the serialized metadata goes away and the memory overhead becomes completely bounded by the outbound buffer size of the repository implementation.
As we move to larger repositories this makes master node stability a lot more predictable since writing out
RepositoryDatadoes not take as much memory any longer (same applies to shard level metadata), enables aggregating multiple metadata blobs into a single larger blobs without massive overhead and removes the 2G size limit onRepositoryData.