feature(s3/manager): add option to control default checksums#3151
feature(s3/manager): add option to control default checksums#3151
Conversation
d755c3b to
b32d2b6
Compare
b32d2b6 to
a6b9cb3
Compare
feature/s3/manager/upload.go
Outdated
| // isS3ExpressBucket returns true if the bucket has the S3 Express suffix. | ||
| func (u *multiuploader) isS3ExpressBucket() bool { | ||
| bucketName := aws.ToString(u.in.Bucket) | ||
| return strings.HasSuffix(bucketName, "--x-s3") |
There was a problem hiding this comment.
I don't believe this is actually necessary -- if you look at line 336 in this file, we add a post-endpoint-resolution check that defaults the checksum if the endpoint resolution process determined that we were using an express bucket. You should be able to just verify this with an additional unit test.
There was a problem hiding this comment.
@lucix-aws Thanks. I'm not quite sure how to add a unit test here because that checksum is added in HandleFinalize, and the mock request doesn't seem to get there.
There was a problem hiding this comment.
the mock request doesn't seem to get there
I don't really understand what this means. No you're right actually, because that middleware is added in the low-level s3 client which the tests in this package would all be mocking. So this would have to be done in an integration test, which we as the SDK team would have to handle.
There was a problem hiding this comment.
I will try to find some time to take care of that next week.
|
Greetings! It looks like this PR hasn’t been active in longer than a week, add a comment or an upvote to prevent automatic closure, or if the issue is already closed, please feel free to open a new one. |
a6b9cb3 to
36f5784
Compare
As announced in aws#2960, AWS SDK for Go v2 service/s3 v1.73.0 shipped a change (aws#2808) that adopted new default integrity protections, automatically calculating CRC32 checksums for operations like PutObject and UploadPart. While it is possible to revert to the previous behavior by setting `AWS_REQUEST_CHECKSUM_CALCULATION=when_required`, this config setting did not apply to the S3 Manager multipart uploader, which always enabled CRC32 checksums by default regardless of the global setting. This commit adds a new `RequestChecksumCalculation` field to the Uploader struct that allows users to control checksum behavior: - `RequestChecksumCalculationWhenSupported` (default): Always calculates CRC32 checksums for multipart uploads - `RequestChecksumCalculationWhenRequired`: Only calculates checksums when explicitly set by the user, preserving backwards compatibility For example: ```go uploader := manager.NewUploader(client, func(u *manager.Uploader) { u.RequestChecksumCalculation = aws.RequestChecksumCalculationWhenRequired }) ``` S3 Express One Zone buckets always require CRC32 checksums regardless of this setting, as mandated by the S3 Express service requirements. The uploader automatically detects S3 Express buckets (names ending with `--x-s3`) and applies CRC32 checksums unconditionally. Fixes aws#3007 Signed-off-by: Stan Hu <stanhu@gmail.com>
24798f7 to
fb4dd5a
Compare
|
Patch looks fine, approval pending integration tests but I'll run CI now. Ignore failing codegen and integration tests, those don't work in forks. |
…9844) Updates: - unpin AWS dependencies and run `make vendor-update` - add config options to enable checksums only if required by storage in order to preserve backwards compatibility Related issues: - #9748 - #8622 Tested with: AWS S3, self-hosted MinIO, Linode object storage as it was failing previously with multi-part uploads (reported here - #8630 (comment)). An updated library allows (PR with the fix - aws/aws-sdk-go-v2#3151) overriding multi-part upload configurations so that compatibility can be preserved. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
…9844) Updates: - unpin AWS dependencies and run `make vendor-update` - add config options to enable checksums only if required by storage in order to preserve backwards compatibility Related issues: - #9748 - #8622 Tested with: AWS S3, self-hosted MinIO, Linode object storage as it was failing previously with multi-part uploads (reported here - #8630 (comment)). An updated library allows (PR with the fix - aws/aws-sdk-go-v2#3151) overriding multi-part upload configurations so that compatibility can be preserved. Signed-off-by: Zakhar Bessarab <z.bessarab@victoriametrics.com>
|
Will this eventually be reworked to respect global setting? Seems confusing that uploader gets its own setting and ignores global one. |
…ection As discussed in aws/aws-sdk-go-v2#3003 and aws/aws-sdk-go-v2#2960, `github.com/aws/aws-sdk-go-v2/service/s3` v1.73.0 changed the AWS SDK default object integrity behavior. Third-party S3 providers, such as Linode, may fail with `XAmzContentSHA256Mismatch` error as a result. A workaround is to set the `AWS_REQUEST_CHECKSUM_CALCULATION` and `AWS_RESPONSE_CHECKSUM_VALIDATION` to `when_required`. However, these environment variables do not affect multipart uploads that used the SDK's `manager.Uploader` implementation (aws/aws-sdk-go-v2#3007). Multipart uploads fail with a 400 Bad Request due to the inclusion of `X-Amz-Sdk-Checksum-Algorithm: CRC32` HTTP headers. With aws/aws-sdk-go-v2#3151, the default integirty protection can be configured for `manager.Uploader`. To ensure backwards compatibility with third-party S3 providers, this commit adds support for two query parameters: * `request_checksum_calculation` - `when_supported`, `when_required` * `response_checksum_calculation` - `when_supported`, `when_required` For example, on Linode, the defaults don't work: ``` % cat main.go | ./gocdk-blob upload "s3://smybucket?endpoint=https://us-sea-1.linodeobjects.com®ion=us-sea-1" main.go gocdk-blob: closing the writer: blob (key "main.go") (code=Unknown): operation error S3: PutObject, https response error StatusCode: 400, RequestID: <redacted>, HostID: <redacted>, api error XAmzContentSHA256Mismatch: UnknownError ``` Using `request_checksum_calculation=when_required` works: ``` % cat main.go | ./gocdk-blob upload "s3://smybucket?endpoint=https://us-sea-1.linodeobjects.com®ion=us-sea-1&request_checksum_calculation=when_required" main.go % ``` This test was repeated with a larger to validate multipart uploads work.
…ection As discussed in aws/aws-sdk-go-v2#3003 and aws/aws-sdk-go-v2#2960, `github.com/aws/aws-sdk-go-v2/service/s3` v1.73.0 changed the AWS SDK default object integrity behavior. Third-party S3 providers, such as Linode, may fail with `XAmzContentSHA256Mismatch` error as a result. A workaround is to set the `AWS_REQUEST_CHECKSUM_CALCULATION` and `AWS_RESPONSE_CHECKSUM_VALIDATION` to `when_required`. However, these environment variables do not affect multipart uploads that used the SDK's `manager.Uploader` implementation (aws/aws-sdk-go-v2#3007). Multipart uploads fail with a 400 Bad Request due to the inclusion of `X-Amz-Sdk-Checksum-Algorithm: CRC32` HTTP headers. With aws/aws-sdk-go-v2#3151, the default integrity protection can be explicitly configured for `manager.Uploader`. To ensure backwards compatibility with third-party S3 providers, this commit adds support for two query parameters: * `request_checksum_calculation` - `when_supported`, `when_required` * `response_checksum_calculation` - `when_supported`, `when_required` For example, on Linode, the defaults don't work: ``` % cat main.go | ./gocdk-blob upload "s3://smybucket?endpoint=https://us-sea-1.linodeobjects.com®ion=us-sea-1" main.go gocdk-blob: closing the writer: blob (key "main.go") (code=Unknown): operation error S3: PutObject, https response error StatusCode: 400, RequestID: <redacted>, HostID: <redacted>, api error XAmzContentSHA256Mismatch: UnknownError ``` Using `request_checksum_calculation=when_required` works: ``` % cat main.go | ./gocdk-blob upload "s3://smybucket?endpoint=https://us-sea-1.linodeobjects.com®ion=us-sea-1&request_checksum_calculation=when_required" main.go % ``` This test was repeated with a larger to validate multipart uploads work.
…ection As discussed in aws/aws-sdk-go-v2#3003 and aws/aws-sdk-go-v2#2960, `github.com/aws/aws-sdk-go-v2/service/s3` v1.73.0 changed the AWS SDK default object integrity behavior. Third-party S3 providers, such as Linode, may fail with `XAmzContentSHA256Mismatch` error as a result. A workaround is to set the `AWS_REQUEST_CHECKSUM_CALCULATION` and `AWS_RESPONSE_CHECKSUM_VALIDATION` to `when_required`. However, these environment variables do not affect multipart uploads that used the SDK's `manager.Uploader` implementation (aws/aws-sdk-go-v2#3007). Multipart uploads fail with a 400 Bad Request due to the inclusion of `X-Amz-Sdk-Checksum-Algorithm: CRC32` HTTP headers. With aws/aws-sdk-go-v2#3151, the default integrity protection can be explicitly configured for `manager.Uploader`. To ensure backwards compatibility with third-party S3 providers, this commit adds support for two query parameters: * `request_checksum_calculation` - `when_supported`, `when_required` * `response_checksum_calculation` - `when_supported`, `when_required` For example, on Linode, the defaults don't work: ``` % cat main.go | ./gocdk-blob upload "s3://smybucket?endpoint=https://us-sea-1.linodeobjects.com®ion=us-sea-1" main.go gocdk-blob: closing the writer: blob (key "main.go") (code=Unknown): operation error S3: PutObject, https response error StatusCode: 400, RequestID: <redacted>, HostID: <redacted>, api error XAmzContentSHA256Mismatch: UnknownError ``` Using `request_checksum_calculation=when_required` works: ``` % cat main.go | ./gocdk-blob upload "s3://smybucket?endpoint=https://us-sea-1.linodeobjects.com®ion=us-sea-1&request_checksum_calculation=when_required" main.go % ``` This test was repeated with a larger to validate multipart uploads work.
…ection As discussed in aws/aws-sdk-go-v2#3003 and aws/aws-sdk-go-v2#2960, `github.com/aws/aws-sdk-go-v2/service/s3` v1.73.0 changed the AWS SDK default object integrity behavior. Third-party S3 providers, such as Linode, may fail with `XAmzContentSHA256Mismatch` error as a result. A workaround is to set the `AWS_REQUEST_CHECKSUM_CALCULATION` and `AWS_RESPONSE_CHECKSUM_VALIDATION` to `when_required`. However, these environment variables do not affect multipart uploads that used the SDK's `manager.Uploader` implementation (aws/aws-sdk-go-v2#3007). Multipart uploads fail with a 400 Bad Request due to the inclusion of `X-Amz-Sdk-Checksum-Algorithm: CRC32` HTTP headers. With aws/aws-sdk-go-v2#3151, the default integrity protection can be explicitly configured for `manager.Uploader`. To ensure backwards compatibility with third-party S3 providers, this commit adds support for two query parameters: * `request_checksum_calculation` - `when_supported`, `when_required` * `response_checksum_calculation` - `when_supported`, `when_required` For example, on Linode, the defaults don't work: ``` % cat main.go | ./gocdk-blob upload "s3://smybucket?endpoint=https://us-sea-1.linodeobjects.com®ion=us-sea-1" main.go gocdk-blob: closing the writer: blob (key "main.go") (code=Unknown): operation error S3: PutObject, https response error StatusCode: 400, RequestID: <redacted>, HostID: <redacted>, api error XAmzContentSHA256Mismatch: UnknownError ``` Using `request_checksum_calculation=when_required` works: ``` % cat main.go | ./gocdk-blob upload "s3://smybucket?endpoint=https://us-sea-1.linodeobjects.com®ion=us-sea-1&request_checksum_calculation=when_required" main.go % ``` This test was repeated with a larger to validate multipart uploads work.
…ection As discussed in aws/aws-sdk-go-v2#3003 and aws/aws-sdk-go-v2#2960, `github.com/aws/aws-sdk-go-v2/service/s3` v1.73.0 changed the AWS SDK default object integrity behavior. Third-party S3 providers, such as Linode, may fail with `XAmzContentSHA256Mismatch` error as a result. A workaround is to set the `AWS_REQUEST_CHECKSUM_CALCULATION` and `AWS_RESPONSE_CHECKSUM_VALIDATION` to `when_required`. However, these environment variables do not affect multipart uploads that used the SDK's `manager.Uploader` implementation (aws/aws-sdk-go-v2#3007). Multipart uploads fail with a 400 Bad Request due to the inclusion of `X-Amz-Sdk-Checksum-Algorithm: CRC32` HTTP headers. With aws/aws-sdk-go-v2#3151, the default integrity protection can be explicitly configured for `manager.Uploader`. To ensure backwards compatibility with third-party S3 providers, this commit adds support for two query parameters: * `request_checksum_calculation` - `when_supported`, `when_required` * `response_checksum_calculation` - `when_supported`, `when_required` For example, on Linode, the defaults don't work: ``` % cat main.go | ./gocdk-blob upload "s3://smybucket?endpoint=https://us-sea-1.linodeobjects.com®ion=us-sea-1" main.go gocdk-blob: closing the writer: blob (key "main.go") (code=Unknown): operation error S3: PutObject, https response error StatusCode: 400, RequestID: <redacted>, HostID: <redacted>, api error XAmzContentSHA256Mismatch: UnknownError ``` Using `request_checksum_calculation=when_required` works: ``` % cat main.go | ./gocdk-blob upload "s3://smybucket?endpoint=https://us-sea-1.linodeobjects.com®ion=us-sea-1&request_checksum_calculation=when_required" main.go % ``` This test was repeated with a larger to validate multipart uploads work.
…ection As discussed in aws/aws-sdk-go-v2#3003 and aws/aws-sdk-go-v2#2960, `github.com/aws/aws-sdk-go-v2/service/s3` v1.73.0 changed the AWS SDK default object integrity behavior. Third-party S3 providers, such as Linode, may fail with `XAmzContentSHA256Mismatch` error as a result. A workaround is to set the `AWS_REQUEST_CHECKSUM_CALCULATION` and `AWS_RESPONSE_CHECKSUM_VALIDATION` to `when_required`. However, these environment variables do not affect multipart uploads that used the SDK's `manager.Uploader` implementation (aws/aws-sdk-go-v2#3007). Multipart uploads fail with a 400 Bad Request due to the inclusion of `X-Amz-Sdk-Checksum-Algorithm: CRC32` HTTP headers. With aws/aws-sdk-go-v2#3151, the default integrity protection can be explicitly configured for `manager.Uploader`. To ensure backwards compatibility with third-party S3 providers, this commit adds support for two query parameters: * `request_checksum_calculation` - `when_supported`, `when_required` * `response_checksum_calculation` - `when_supported`, `when_required` For example, on Linode, the defaults don't work: ``` % cat main.go | ./gocdk-blob upload "s3://smybucket?endpoint=https://us-sea-1.linodeobjects.com®ion=us-sea-1" main.go gocdk-blob: closing the writer: blob (key "main.go") (code=Unknown): operation error S3: PutObject, https response error StatusCode: 400, RequestID: <redacted>, HostID: <redacted>, api error XAmzContentSHA256Mismatch: UnknownError ``` Using `request_checksum_calculation=when_required` works: ``` % cat main.go | ./gocdk-blob upload "s3://smybucket?endpoint=https://us-sea-1.linodeobjects.com®ion=us-sea-1&request_checksum_calculation=when_required" main.go % ``` This test was repeated with a larger to validate multipart uploads work.
…ection As discussed in aws/aws-sdk-go-v2#3003 and aws/aws-sdk-go-v2#2960, `github.com/aws/aws-sdk-go-v2/service/s3` v1.73.0 changed the AWS SDK default object integrity behavior. Third-party S3 providers, such as Linode, may fail with `XAmzContentSHA256Mismatch` error as a result. A workaround is to set the `AWS_REQUEST_CHECKSUM_CALCULATION` and `AWS_RESPONSE_CHECKSUM_VALIDATION` to `when_required`. However, these environment variables do not affect multipart uploads that used the SDK's `manager.Uploader` implementation (aws/aws-sdk-go-v2#3007). Multipart uploads fail with a 400 Bad Request due to the inclusion of `X-Amz-Sdk-Checksum-Algorithm: CRC32` HTTP headers. With aws/aws-sdk-go-v2#3151, the default integrity protection can be explicitly configured for `manager.Uploader`. To ensure backwards compatibility with third-party S3 providers, this commit adds support for two query parameters: * `request_checksum_calculation` - `when_supported`, `when_required` * `response_checksum_calculation` - `when_supported`, `when_required` For example, on Linode, the defaults don't work: ``` % cat main.go | ./gocdk-blob upload "s3://smybucket?endpoint=https://us-sea-1.linodeobjects.com®ion=us-sea-1" main.go gocdk-blob: closing the writer: blob (key "main.go") (code=Unknown): operation error S3: PutObject, https response error StatusCode: 400, RequestID: <redacted>, HostID: <redacted>, api error XAmzContentSHA256Mismatch: UnknownError ``` Using `request_checksum_calculation=when_required` works: ``` % cat main.go | ./gocdk-blob upload "s3://smybucket?endpoint=https://us-sea-1.linodeobjects.com®ion=us-sea-1&request_checksum_calculation=when_required" main.go % ``` This test was repeated with a larger to validate multipart uploads work.
…ection As discussed in aws/aws-sdk-go-v2#3003 and aws/aws-sdk-go-v2#2960, `github.com/aws/aws-sdk-go-v2/service/s3` v1.73.0 changed the AWS SDK default object integrity behavior. Third-party S3 providers, such as Linode, may fail with `XAmzContentSHA256Mismatch` error as a result. A workaround is to set the `AWS_REQUEST_CHECKSUM_CALCULATION` and `AWS_RESPONSE_CHECKSUM_VALIDATION` to `when_required`. However, these environment variables do not affect multipart uploads that used the SDK's `manager.Uploader` implementation (aws/aws-sdk-go-v2#3007). Multipart uploads fail with a 400 Bad Request due to the inclusion of `X-Amz-Sdk-Checksum-Algorithm: CRC32` HTTP headers. With aws/aws-sdk-go-v2#3151, the default integrity protection can be explicitly configured for `manager.Uploader`. To ensure backwards compatibility with third-party S3 providers, this commit adds support for two query parameters: * `request_checksum_calculation` - `when_supported`, `when_required` * `response_checksum_calculation` - `when_supported`, `when_required` For example, on Linode, the defaults don't work: ``` % cat main.go | ./gocdk-blob upload "s3://smybucket?endpoint=https://us-sea-1.linodeobjects.com®ion=us-sea-1" main.go gocdk-blob: closing the writer: blob (key "main.go") (code=Unknown): operation error S3: PutObject, https response error StatusCode: 400, RequestID: <redacted>, HostID: <redacted>, api error XAmzContentSHA256Mismatch: UnknownError ``` Using `request_checksum_calculation=when_required` works: ``` % cat main.go | ./gocdk-blob upload "s3://smybucket?endpoint=https://us-sea-1.linodeobjects.com®ion=us-sea-1&request_checksum_calculation=when_required" main.go % ``` This test was repeated with a larger to validate multipart uploads work.
As announced in #2960, AWS SDK for Go v2 service/s3 v1.73.0 shipped a change (#2808) that adopted new default integrity protections, automatically calculating CRC32 checksums for operations like PutObject and UploadPart.
While it is possible to revert to the previous behavior by setting
AWS_REQUEST_CHECKSUM_CALCULATION=when_required, this config setting did not apply to the S3 Manager multipart uploader, which always enabled CRC32 checksums by default regardless of the global setting.This commit adds a new
RequestChecksumCalculationfield to the Uploader struct that allows users to control checksum behavior:RequestChecksumCalculationWhenSupported(default): Always calculates CRC32 checksums for multipart uploadsRequestChecksumCalculationWhenRequired: Only calculates checksums when explicitly set by the user, preserving backwards compatibilityFor example:
S3 Express One Zone buckets always require CRC32 checksums regardless of this setting, as mandated by the S3 Express service requirements. The uploader automatically detects S3 Express buckets (names ending with
--x-s3) and applies CRC32 checksums unconditionally.Fixes #3007