Skip to content

Implement multipart upload for azureblob-sdk provider#904

Merged
gaul merged 10 commits intogaul:masterfrom
klaudworks:add-azureblobsdk-multipart-upload
Nov 20, 2025
Merged

Implement multipart upload for azureblob-sdk provider#904
gaul merged 10 commits intogaul:masterfrom
klaudworks:add-azureblobsdk-multipart-upload

Conversation

@klaudworks
Copy link
Contributor

@klaudworks klaudworks commented Oct 18, 2025

Current Approach

The current approach forwards the input stream from the incoming request to the azure sdk. I used a Flux<ByteBuffer> to work around the limitation that our input stream does not support mark / reset. I looked into the azure-sdk-for-java after seeing your issue: Azure/azure-sdk-for-java#42603. It seems like this is the only workaround.

Current Limitations

  1. Given, that we don't really have a place to store md5-based Etags for each part, we return them but don't actually use them to complete the multipart upload. Instead, we try to find all parts belonging to a multipart upload by using the upload id. For the finally assembled blob, we just return azure's Etag.
  2. We can't make use of the azure-java-sdk retry logic on network issues. There is no way to solve this without persisting the input stream. The problem is pushed down to the s3-proxy client's aws sdk, which will retry uploading failed parts.
  3. We don't handle if-match, if-none-match headers for conditional writes. This is supported by S3 for CompleteMultipartUpload requests. EDIT: I added a commit to support conditional writes for CompleteMultipartUpload requests.
  4. AWS supports up to 5GiB part size. I updated the max part size to 4000 MiB as documented here: https://learn.microsoft.com/en-us/azure/storage/blobs/scalability-targets. The default in the AWS SDK is 8MB so I would rather not implement some complicated chunking of parts on our side to cover the edge case where people upload 4-5 GiB files.

Alternative Approaches

Using a BufferedInputStream

I discarded this solution due to potentially high memory usage when uploading multiple parts in parallel.

Persisting the input stream on disk

Limitation 1: Potentially doubles the time until the part upload completes -> Can be compensated by uploading more files in parallel.

Limitation 2: Need to inform users that they provide sufficient /tmp storage on sufficiently fast SSDs. On an HDD the user would be heavily IO bound.

Benefit: We could actually calculate proper Etags for each part and store the md5 hash, e.g. encoded in the block name. This is not possible with the current solution because we need to already provide the block name when we pass along the input stream from the request to the azure sdk.

s3-tests Update

gaul/s3-tests#4

EDIT: I also running this on a test kubernetes cluster. It seems to work just fine. Some tooling (cnpg, kafka connector) can run their backups through it.

Fixes

#709 #553 #552

@klaudworks
Copy link
Contributor Author

FYI @gaul I'm running this in a stage k8s cluster and it seems to work just fine.

Copy link
Owner

@gaul gaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address comments in uploadMultipartPart but overall this looks good!

}
int chunkSize = (int) Math.min(4 * 1024 * 1024L,
contentLength - position);
ByteBuffer buffer = ByteBuffer.allocate(chunkSize);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you allocate this outside the loop and reuse it within the loop? You will want to call ByteBuffer.reset here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may be a bad idea because there is no guarantee that the AzureSDK strictly processes one ByteBuffer after another. Instead, we may overwrite a ByteBuffer that the AzureSDK still relies on.

@klaudworks
Copy link
Contributor Author

@gaul addressed all your other comments. Thanks for reviewing it.

@klaudworks klaudworks requested a review from gaul November 12, 2025 07:41
@DzeCin

This comment was marked as resolved.

@klaudworks
Copy link
Contributor Author

@DzeCin what makes you think so? There should be no state persisted in the s3 proxy.

@DzeCin
Copy link

DzeCin commented Nov 13, 2025

@DzeCin what makes you think so? There should be no state persisted in the s3 proxy.

You are right I misread the code, sorry for the confusion.

@gaul gaul merged commit fb96699 into gaul:master Nov 20, 2025
3 checks passed
@gaul
Copy link
Owner

gaul commented Nov 20, 2025

Thank you for your contribution @klaudworks! Let me tidy up a few other things but I will run a new release soon.

@klaudworks
Copy link
Contributor Author

klaudworks commented Nov 20, 2025

@gaul no hurry, I am currently running a custom build in production anyways. Thank you for reviewing the changes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants