Remove S3 output stream by tlrx · Pull Request #27280 · elastic/elasticsearch

tlrx · 2017-11-06T16:13:04Z

The S3OutputStream class has been added in elasticsearch 1.4 in order to take advantage of the AWS Multipart Upload API. At that time, the Snapshot/Restore API didn't communicate the size of the blobs to be written to the repository implementation: the logic was to open an Outputstream and start to write bytes to it. With this logic, we decided to bufferize the bytes in memory until we know which API to use between a single upload request or multipart upload requests. This is why the buffer and associated logic was implemented as an OutputStream.

Now the blob size information is available before writing anything, the repository implementation can know upfront what will be the more suitable API to upload the blob to S3.

This pull request removes the DefaultS3OutputStream and S3OutputStream classes and moves the implementation of the upload logic directly in the S3BlobContainer. I think it's easier to understand and easier to maintain. It also avoids the internal buffering by passing the InputStream directly to the S3 client that takes care of buffering (up to 16Mb) and retry the requests. It removes pressure on memory (some allocations outside TLAB drop from 2.7Gb to 57Mb in my tests while total snapshot time drops from 35%), specially on nodes with small heap like 1Gb-4Gb.

Note: this has been tested together with #27278.

imotov

LGTM from the snapshot/restore perspective. Love the simplification. Left one minor observation. Feel free to ignore it if you disagree.

imotov · 2017-11-07T01:36:05Z

plugins/repository-s3/src/main/java/org/elasticsearch/repositories/s3/S3BlobContainer.java

I fell like a functional interface here doesn't really buy us anything comparing to a simple if statement with two calls.

I agree and pushed 4421ee1

ywelsch

LGTM. Left one minor nit.

ywelsch · 2017-11-09T09:09:49Z

plugins/repository-s3/src/main/java/org/elasticsearch/repositories/s3/S3Repository.java

can you extract these as contants and then use these instead of redefining them in S3BlobContainer? Can you also document the limitations of S3 in the Javadoc here?

I pushed 4aae0ce

Looks good. Thanks :)

Close elastic#26969 Related elastic#26993

Now the blob size information is available before writing anything, the repository implementation can know upfront what will be the more suitable API to upload the blob to S3. This commit removes the DefaultS3OutputStream and S3OutputStream classes and moves the implementation of the upload logic directly in the S3BlobContainer. related #26993 closes #26969

* es/master: (24 commits) Reduce synchronization on field data cache add json-processor support for non-map json types (#27335) Properly format IndexGraveyard deletion date as date (#27362) Upgrade AWS SDK Jackson Databind to 2.6.7.1 Stop responding to ping requests before master abdication (#27329) [Test] Fix POI version in packaging tests Allow affix settings to specify dependencies (#27161) Tests: Improve size regex in documentation test (#26879) reword comment Remove unnecessary logger creation for doc values field data [Geo] Decouple geojson parse logic from ShapeBuilders [DOCS] Fixed link to docker content Plugins: Add versionless alias to all security policy codebase properties (#26756) [Test] #27342 Fix SearchRequests#testValidate [DOCS] Move X-Pack-specific Docker content (#27333) Fail queries with scroll that explicitely set request_cache (#27342) [Test] Fix S3BlobStoreContainerTests.testNumberOfMultiparts() Set minimum_master_nodes to all nodes for REST tests (#27344) [Tests] Relax allowed delta in extended_stats aggregation (#27171) Remove S3 output stream (#27280) ...

Now the blob size information is available before writing anything, the repository implementation can know upfront what will be the more suitable API to upload the blob to S3. This commit removes the DefaultS3OutputStream and S3OutputStream classes and moves the implementation of the upload logic directly in the S3BlobContainer. related #26993 closes #26969

tlrx · 2017-11-14T10:28:29Z

This change has been backported to 5.6.5, 6.0.1, 6.1.0 and master.

tlrx added :Plugin Repository S3 >enhancement v6.1.0 v7.0.0 labels Nov 6, 2017

tlrx requested review from imotov and ywelsch November 6, 2017 16:13

imotov approved these changes Nov 7, 2017

View reviewed changes

This was referenced Nov 9, 2017

S3 snapshots with timeout failures after upgrade to 5.5.2 #26576

Closed

Slow snapshots to S3 #24123

Closed

ywelsch approved these changes Nov 9, 2017

View reviewed changes

tlrx mentioned this pull request Nov 9, 2017

Update to AWS SDK 1.11.223 #27278

Merged

tlrx added 3 commits November 10, 2017 10:34

Remove S3 output stream

51c924b

Remove functional interface

aed3045

Close elastic#26969 Related elastic#26993

Move constants and add doc

cc3c690

tlrx force-pushed the remove-s3-outputstream branch from 4aae0ce to cc3c690 Compare November 10, 2017 09:34

tlrx merged commit 9c4d6c6 into elastic:master Nov 10, 2017

tlrx added the v6.0.1 label Nov 10, 2017

tlrx added the v5.6.5 label Nov 14, 2017

tlrx deleted the remove-s3-outputstream branch November 14, 2017 10:29

clintongormley added :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs and removed :Plugin Repository S3 labels Feb 14, 2018

tlrx mentioned this pull request Mar 27, 2018

Remove S3OutputStream in favor of TransferManager #26993

Closed

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove S3 output stream#27280

Remove S3 output stream#27280
tlrx merged 3 commits intoelastic:masterfrom
tlrx:remove-s3-outputstream

tlrx commented Nov 6, 2017

Uh oh!

imotov left a comment

Uh oh!

imotov Nov 7, 2017

Uh oh!

tlrx Nov 9, 2017

Uh oh!

ywelsch left a comment

Uh oh!

ywelsch Nov 9, 2017

Uh oh!

tlrx Nov 9, 2017

Uh oh!

ywelsch Nov 9, 2017

Uh oh!

tlrx commented Nov 14, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

tlrx commented Nov 6, 2017

Uh oh!

imotov left a comment

Choose a reason for hiding this comment

Uh oh!

imotov Nov 7, 2017

Choose a reason for hiding this comment

Uh oh!

tlrx Nov 9, 2017

Choose a reason for hiding this comment

Uh oh!

ywelsch left a comment

Choose a reason for hiding this comment

Uh oh!

ywelsch Nov 9, 2017

Choose a reason for hiding this comment

Uh oh!

tlrx Nov 9, 2017

Choose a reason for hiding this comment

Uh oh!

ywelsch Nov 9, 2017

Choose a reason for hiding this comment

Uh oh!

tlrx commented Nov 14, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants