Fix EncryptedBlobStore support for s3 compatible backends without eTag parts number suffix#825
Conversation
|
I can't look at this right now but does the undocumented S3 API to return the number of parts help here? https://github.com/gaul/undocumented-s3-apis?tab=readme-ov-file#get-object-by-multipart-number |
…g suffix Some mostly S3 REST API compatible storage backends do not return the number of multipart upload parts as a suffix to the eTag as Amazon does and as the previous code expects. An example for this is NetApp StorageGRID S3 REST API. The old code had a fallback to just assume one encrypted part, but this is actually wrong in the multipart upload case for these backends, thus this is replace in this commit. Example eTag structure from real AWS S3 for multipart uploads with 2 parts:`xyzabc-2` In this case the number of parts is also not present in the object metadata, as metadata is set on S3 API when starting the multipart upload, and not when completing it, thus the number of parts is not yet known at this earlier point in time. This change adds a third fallback option to just read the number of parts from the final part padding, which is the only place guaranteed to always be present for encrypted blobs. Only for these backends this adds an additional GET request to the backend, but only for the actual 64 bytes of the padding.
5f931a3 to
b420e92
Compare
Unfortunately the As neither this additional header nor the eTag suffix are really specified by the S3 API common response headers I guess there can also be more "mostly S3 compatible" storages that don't have these for multipart uploads. I rebased on master, to get the Python yield CI fix, could you please trigger to run the GitHub Actions again? |
|
Thank you for your contribution @polarctos! |
Some mostly S3 REST API compatible storage backends do not return the number of multipart upload parts as a suffix to the eTag as Amazon does and as the previous code expects. An example for this is NetApp StorageGRID S3 REST API.
The old code had a fallback to just assume one encrypted part, but this is actually wrong in the multipart upload case for these backends, thus this problematic behaviour is replaced in this commit.
Example eTag structure from real AWS S3 for multipart uploads with suffix for 2 parts:
xyzabc-2In this case the number of parts is also not present in the object metadata, as metadata can only be set on S3 API when starting the multipart upload, and not when completing it, thus the number of parts is not yet known at this earlier point in time.
This change adds a third fallback option to just read the number of encrypted parts from the final part padding, which is the only place guaranteed to always be present for encrypted blobs.
Only for these backends this adds an additional GET request to the backend, but only for the actual 64 bytes of the padding. Before this change, the returned sizes were not correct for multipart uploads with more than one part on these backends without eTag parts number suffix.