Add Default Retry on InternalError during CompleteMultipartUpload

We've been experiencing repeated issues on a system storing files in S3 which are reported up the chain as follows:

 We encountered an internal error. Please try again. (Service: null; Status Code: 0; Error Code: InternalError; Request ID: [snip]).

I've turned on bucket logging for the relevant bucket, and I've tracked the source down to:

[owner] [bucketname] [15/Oct/2015:16:52:03 +0000] [ip] arn:aws:iam::[accountnumber]:user/[iamuser] [request] REST.POST.UPLOAD [object] "POST [uploaddetails] HTTP/1.1" 200 InternalError 282 28885144 97 95 "-" "aws-sdk-java/1.10.16 Linux/3.10.0-229.14.1.el7.x86_64 Java_HotSpot(TM)_64-Bit_Server_VM/25.45-b02/1.8.0_45" - 

In all cases, it's InternalErrors on the Complete call for uploads that have been designated as multipart.  We're not "hand-rolling" the multipart upload call, we're just creating a PutObject request, calling "upload" on a transfer manager then "WaitForCompletion" on the returned upload object. We are passing a custom implementation of RetryCondition to our S3 client, which neatly handles most errors. However, having traced through the SDK code, it looks like RetryConditions only apply when a non-OK response comes back to the actual http client. In the case of CompleteMultipartUpload, as per the docs and the logs what comes back on an "InternalError" is _not_ an http failure, so none of the RetryConditions we've defined apply.

I've had a look through the SDK and I'm reasonably confident that there's no wrapping on the "Complete" call that's made when making a default upload call to retry this final request if it fails. It would be ideal if this could be added - we can't put any handling for it into our code as it is now because the multipart upload is being done "under the hood", so when our code is informed of the failure there's no obvious way to extract the details of the specifically failed call to force a retry of it.

If we were "manually" creating our multipart uploads when the files were of sufficient size I believe we could easily add error handling for this to our code, but we're currently using the Async methods available to launch plenty of things in parallel, so I believe to do so we'd need to re-implement a significant amount of the code around this functionality to get this to work easily (though if anyone has suggestions on how we could easily wrap this in some retry logic that would be welcome). To me, it looks like it would be a lot better if the calls to CompleteMultiPartUpload within the SDK had some wrapping around them so that completion would be retried on a failure, as this appears to be the recommendation within the error message.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Default Retry on InternalError during CompleteMultipartUpload #538

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add Default Retry on InternalError during CompleteMultipartUpload #538

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions