Merged
Conversation
imotov
approved these changes
Nov 7, 2017
Contributor
imotov
left a comment
There was a problem hiding this comment.
LGTM from the snapshot/restore perspective. Love the simplification. Left one minor observation. Feel free to ignore it if you disagree.
Contributor
There was a problem hiding this comment.
I fell like a functional interface here doesn't really buy us anything comparing to a simple if statement with two calls.
This was referenced Nov 9, 2017
Closed
ywelsch
approved these changes
Nov 9, 2017
Contributor
ywelsch
left a comment
There was a problem hiding this comment.
LGTM. Left one minor nit.
Contributor
There was a problem hiding this comment.
can you extract these as contants and then use these instead of redefining them in S3BlobContainer? Can you also document the limitations of S3 in the Javadoc here?
4aae0ce to
cc3c690
Compare
tlrx
added a commit
that referenced
this pull request
Nov 10, 2017
Now the blob size information is available before writing anything, the repository implementation can know upfront what will be the more suitable API to upload the blob to S3. This commit removes the DefaultS3OutputStream and S3OutputStream classes and moves the implementation of the upload logic directly in the S3BlobContainer. related #26993 closes #26969
tlrx
added a commit
that referenced
this pull request
Nov 10, 2017
Now the blob size information is available before writing anything, the repository implementation can know upfront what will be the more suitable API to upload the blob to S3. This commit removes the DefaultS3OutputStream and S3OutputStream classes and moves the implementation of the upload logic directly in the S3BlobContainer. related #26993 closes #26969
martijnvg
added a commit
that referenced
this pull request
Nov 14, 2017
* es/master: (24 commits) Reduce synchronization on field data cache add json-processor support for non-map json types (#27335) Properly format IndexGraveyard deletion date as date (#27362) Upgrade AWS SDK Jackson Databind to 2.6.7.1 Stop responding to ping requests before master abdication (#27329) [Test] Fix POI version in packaging tests Allow affix settings to specify dependencies (#27161) Tests: Improve size regex in documentation test (#26879) reword comment Remove unnecessary logger creation for doc values field data [Geo] Decouple geojson parse logic from ShapeBuilders [DOCS] Fixed link to docker content Plugins: Add versionless alias to all security policy codebase properties (#26756) [Test] #27342 Fix SearchRequests#testValidate [DOCS] Move X-Pack-specific Docker content (#27333) Fail queries with scroll that explicitely set request_cache (#27342) [Test] Fix S3BlobStoreContainerTests.testNumberOfMultiparts() Set minimum_master_nodes to all nodes for REST tests (#27344) [Tests] Relax allowed delta in extended_stats aggregation (#27171) Remove S3 output stream (#27280) ...
tlrx
added a commit
that referenced
this pull request
Nov 14, 2017
Now the blob size information is available before writing anything, the repository implementation can know upfront what will be the more suitable API to upload the blob to S3. This commit removes the DefaultS3OutputStream and S3OutputStream classes and moves the implementation of the upload logic directly in the S3BlobContainer. related #26993 closes #26969
Member
Author
|
This change has been backported to 5.6.5, 6.0.1, 6.1.0 and master. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The
S3OutputStreamclass has been added in elasticsearch 1.4 in order to take advantage of the AWS Multipart Upload API. At that time, the Snapshot/Restore API didn't communicate the size of the blobs to be written to the repository implementation: the logic was to open anOutputstreamand start to write bytes to it. With this logic, we decided to bufferize the bytes in memory until we know which API to use between a single upload request or multipart upload requests. This is why the buffer and associated logic was implemented as anOutputStream.Now the blob size information is available before writing anything, the repository implementation can know upfront what will be the more suitable API to upload the blob to S3.
This pull request removes the
DefaultS3OutputStreamandS3OutputStreamclasses and moves the implementation of the upload logic directly in theS3BlobContainer. I think it's easier to understand and easier to maintain. It also avoids the internal buffering by passing theInputStreamdirectly to the S3 client that takes care of buffering (up to 16Mb) and retry the requests. It removes pressure on memory (some allocations outside TLAB drop from 2.7Gb to 57Mb in my tests while total snapshot time drops from 35%), specially on nodes with small heap like 1Gb-4Gb.Note: this has been tested together with #27278.