rgw: implement S3 additional checksum support#55076
Conversation
eb9cded to
77c073e
Compare
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
|
i revisited the discussion on @imtzw's original pr and was reminded of this one: #49986 (comment)
it doesn't look like this multipart behavior was implemented here either? |
|
PutObject is missing response headers CreateMultipartUpload missing CompleteMultipartUpload is missing:
|
|
i pushed https://github.com/cbodley/ceph/commits/wip-rgw-cksum, which is a rebase of this branch on top of the latest in #54856, plus 3 commits which get the non-multipart test cases in ceph/s3-tests#540 passing. i still can't get xxhash to compile so i commented it out of rgw_cksum.h, but didn't include those changes |
|
it's possible that the blake3 and xx3 mechanisms should be skipped for now; I really wish to have them for future--but for now, there'd be no way to access them; it does seem like we could switch our xxh64 callers to xx3, almost certainly? |
agree that we should focus on the sdk-supported algorithms for now i do think we can use our boto extension strategy to teach it about the new checksum algorithms. that should work for the low-level operations like put_object() etc. where the caller passes in their own computed checksum. we just won't be able to get the higher-level sdk operations like upload_file() to compute those checksums for us |
|
@mattbenjamin pls rebase |
7042084 to
2618a0e
Compare
|
generic rgw suite run I think passed: 2nd rgw suite run with new checksum tests in progress: |
|
rgw/dbstore:
a lot of multipart test cases are filtered out using rgw/lifecycle:
i can't tell why that one failed |
I've been seeing some unexpected lc failures--I saw this in a scenario where it looked like my test system went into screen lock; more recently, I saw consistent failures during specific times of the day (but not in this test). |
|
rerun 3 (rerun 2 had wrong baseline): |
Among other things, provides XXH128 and XXH3. Includes compile fixes for gcc-13. Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Fast, cryptographic hash functions suitable for block checksums. BLAKE3 is the faster, more portable successor to Blake2(b,s), now with 4x throughput. Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Adds new Blake3 digest format (native), a concrete type to represent digests, and static_visitor machinery to unify varying checksum computations. This framework, together with new trailing checksum header support, is used to implement S3 additional checksum verification. Parts of the AWS content checksum API work build on a prior contribution from imtzw <tongzhiwei_yewu@cmss.chinamobile.com>. Thank you! Fixes: https://tracker.ceph.com/issues/42080 Fixes: https://tracker.ceph.com/issues/63951 squashed commits: * rgw_cksum: add trival test vectors for sha-format digests Computed digests match those produced by sha1sum, sha256sum, and sha512sum utilities. * rgw_cksum: add test vectors for blake3 Tests the same input strings with digests validated by b3sum (https://crates.io/crates/b3sum). * rgw_ckum: switch to accel crc32c The internal Ceph convention appears to be to omit a final xor where ceph_crc32c is used, but it's required for compatibility with AWS implementations. * rgw_cksum: add XXH3 digest * rgw_cksum: write class encoder for rgw::digest::Cksum * rgw_cksum: also reverse crc32c (REBASEME) Mark noticed that the crc32c output was being tested against a byteswapped value (crc32c also needs byteswap on LE). * rgw_cksum: add digest::Cksum serde tests * rgw_cksum: fix main(...) linkage (so we run our main unit and not the one in gmock * rgw_cksum: convenience extensions for integration with RGW/S3 * introduce rgw_cksum unique_ptr factory * rgw_cksum: mark string transform accessors const * rgw_cksum: fixup unittest_rgw_chksum compilation--all existing tests pass * rgw_cksum: hook up put-object checksum workflow * tweaks to report on content checksum mismatch * rgw_cksum: match SDK as well as general checksum header * make it more efficient * initialize RGWPutObj_Cksum::digest * rgw_cksum: write parse_cksum_type w/const char* arg * initialize _type correctly; doing armored wrong * fix expected checksum header name, clean up verify * fix output on checksum verify fail, cleanup * introduce Cksum::to_armor(); all AWS cases pass * oops, extra 0-byte at end of to_armor() result * use to_armor() with decoded checksums (i.e., for all S3 presentation) * remove unnecessary finalize() in RGWPutObj_Cksum dtor * RGWPutObj_Cksum::Factory fixes * fixes test_object_checksum_sha256 * choose preferred checksum algorithm header if both are present * log verified checksums in RGWPutObj::execute at 16 * checksum not needed in policy condition * fix checksum trailing header format * move Blake3 to rgw_digest_blake3.h Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
I.e., prove cksum == Cksum(cksum.to_armor().c_str()) Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
includes commits:
* fixes init-multipart header return
* introduce checksum to SAL MultipartPart interface
* thread optional checksum through DataProcessor
* code complete multipart checksum verify
* fix formatter
* fix ckecksum format for multipart objects in GET/HEAD ops
* always return parts_count from ReadOp::prepare() if applicable
This behavior is used when returning the checksum of a multipart
upload object.
* tweak conditional multipart_parts_count
* add checksum output to ListMultipart
* fix nil-return from GetHeaderCksumResult
* re-arm truncated if re-entering list-parts
* complete-multipart w/list-parts
* validate supplied checksum in CompleteMultipart
* verify checksum type against initial checksum algorithm
* rgw_op: suppress more x-amz headers
* final fixes and cleanups
* remove unused t0
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
* fix to deal with parts_count == 1 asymmetry Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
parts_count now returns a value for parts_count whenever the target object is a multipart upload. this has no additional overhead, since this can be read off the manifest Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
* properly transform pseudo headers in PostObj * enable cksum verify in PostObj * match checksum headers in match_policy_vars * fixup add POST headers to environment Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
* remove rgw_cksum_pipe state enum, not needed [Casey review]
* remove a format that just took a single string substitution
and passed it to an iostream [Casey review]
* use boost::to_upper* [Casey review]
* remove unused RGW_ATTR_CKSUM_ALGORITHM decl [Casey review]
* negate error code values in two places [Casey review]
* split cksum digests from base type decls
* resolve comment when checksum requested but not available
* remove redundant memset
* remove junk from rgw_blake3_digest.h
* s/ldpp_dout + fmt::format/ldpp_dout_fmt/g;
* fix conditional return of parts_count
from RGWRados::Object::prepare(). A value for parts_count should
be returned iff a *multipart* object manifest exists.
* remove /tmp output test
* finish moving ceph_crypto headers out of rgw_cksum.h
* consume the optional in multipart_parts_count
* target_attrs can be a reference (but not const)
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Allow MultipartUpload::complete() to have previously called Upload::get_info() without an additional round-trip to fetch attributes. Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Added EXCLUDE_FROM_ALL to prevent gristing files in the subdir. Signed-off-by: Casey Bodley <cbodley@redhat.com> Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
2618a0e to
18ab9f5
Compare
|
jenkins retest this please |
| [submodule "src/boost_redis"] | ||
| path = src/boost_redis | ||
| url = https://github.com/boostorg/redis.git |
There was a problem hiding this comment.
this was added back when you resolved the merge conflict. i opened #58428 to remove it again
we missed this part, so main fails to build in debug mode. i pushed my commit to a new branch and opened #58429 to fix |
|
@cbodley @mattbenjamin |
thanks @rkhudov. we discussed this in slack, but for others seeing this: update your submodule with something like |
rgw: implement S3 additional checksum support
Adds new Blake3 digest format (native), a concrete type to
represent digests, and static_visitor machinery to unify varying
checksum computations.
This framework, together with new trailing checksum header support,
is used to implement S3 additional checksum verification. Parts of
the AWS content checksum API work build on a prior contribution from
imtzw tongzhiwei_yewu@cmss.chinamobile.com.
Thank you!
Fixes: https://tracker.ceph.com/issues/42080
Fixes: https://tracker.ceph.com/issues/63951
squashed commits:
Computed digests match those produced by sha1sum, sha256sum,
and sha512sum utilities.
Tests the same input strings with digests validated by
b3sum (https://crates.io/crates/b3sum).
The internal Ceph convention appears to be to omit a final
xor where ceph_crc32c is used, but it's required for compatibility
with AWS implementations.
Mark noticed that the crc32c output was being tested against a
byteswapped value (crc32c also needs byteswap on LE).
(so we run our main unit and not the one in gmock
Signed-off-by: Matt Benjamin mbenjamin@redhat.com
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e