Skip to content

rgw: cumulatively fix 6 AWS SigV4 request failure cases#54856

Merged
cbodley merged 8 commits intoceph:mainfrom
linuxbox2:wip-accept-new-awssigv4
Feb 16, 2024
Merged

rgw: cumulatively fix 6 AWS SigV4 request failure cases#54856
cbodley merged 8 commits intoceph:mainfrom
linuxbox2:wip-accept-new-awssigv4

Conversation

@mattbenjamin
Copy link
Copy Markdown
Contributor

@mattbenjamin mattbenjamin commented Dec 11, 2023

These changes address checksum header identification and signing
algorithm selection, including checksum trailer verification
for signed- and unsigned-payload cases.

These changes address all the actual S3 request failures I have
so far been able to reproduce, with and without content checksums
and/or new trailing checksum headers, and with and without
SSL.

Fixes: https://tracker.ceph.com/issues/63153

Specifically, it fixes the request failures that motivated the
initial tracker filing. It extracts but does not validate new client content
checksums if present. Validation and management of new
S3 content-checksum headers will follow in a subsequent change.

squashed commits:

  • wip chunk meta parsing--seem to have first AWSv4ComplMulti::ChunkMeta::create_next sort of parsing
  • use constexpr sarlen(...) for static array lengths throughout rgw_auth_s3.cc
  • link AWSv4CompleMulti::ChunkMeta to its enclosing completer
  • capture original content-length header before AWSv4ComplMulti overwrites it
  • mostly extract the trailer
  • fix misordered content-length, experiment w/exbuf
  • save leftover bytes between calls to AWSv4ComplMulti::recv_chunk()
  • propagate data_offset_in_stream from AWSv4ComplMulti::recv_chunk()
  • clean up trailer section extract
  • trailer section cleanup and introduce extract_helper
  • unrolled checksum extract--fixup
  • integrate matching with types from rgw_cksum.h
  • fix sv_trailer end pos, and cleanup
  • add proplist interface to rgw::auth::Completer and AWSv4ComplMulti
  • spliterate trailers
  • check completer props
  • redefine prop_map to point into already-allocated trailer_vec
  • hax: thread a counter onto AWSv4ComplMulti recv_body() and recv_chunk path
  • fix apparent bug where due to reads less than chunk_size induce a final, zero-length read that was skipped before forcing recognition of the last chunk in the stream
  • check only for a trailing checksum named in x-amz-trailer
  • don't try to match signatures when no signature provided (because streaming unsigned)
  • oops, fix content_length decl
  • fix recognition of next chunk envelope in unsigned aws-chunk case
  • clean up AWSv4CompMulti flags and correctly detect aws unsigned chunked
  • rework checksum-trailer extraction and introduce AWSv4ComplMulti::calc_v4_trailing_signature
  • thread const struct req_state* into AWSv4ComplMulti
  • large cleanup of trailer parsing, no regression
  • fix trailer signature calculation--checks
  • correctly generate final chunk hmac
  • typo in comment
  • verify trailing signature when expected (using expected final chunk signature)
  • move trailer_vec back onto recv_body()'s stack
  • remove strange completer comment
  • remove last_frag (now points into parsing_buf)
  • remove implied dependency on content_length
  • move trailer recognition to AWSv4ComplMulti::complete()
  • remove now-unused is_last_chunk() predicate

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

Copy link
Copy Markdown
Contributor

@cbodley cbodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good, the parsing changes make sense

@mattbenjamin mattbenjamin force-pushed the wip-accept-new-awssigv4 branch from a3b1509 to 987a64c Compare December 21, 2023 20:51
@mattbenjamin mattbenjamin force-pushed the wip-accept-new-awssigv4 branch 2 times, most recently from f42fe45 to fd143ea Compare December 30, 2023 23:10
@mattbenjamin
Copy link
Copy Markdown
Contributor Author

@cbodley this should be ready for re-review
junit test suite is here: git@github.com:linuxbox2/jcksum.git (52 tests)

@mattbenjamin mattbenjamin removed the DNM label Dec 30, 2023
@mattbenjamin mattbenjamin force-pushed the wip-accept-new-awssigv4 branch from fd143ea to f838970 Compare December 30, 2023 23:36
@mattbenjamin mattbenjamin force-pushed the wip-accept-new-awssigv4 branch 2 times, most recently from 7433507 to ad18207 Compare December 31, 2023 15:31
@mattbenjamin
Copy link
Copy Markdown
Contributor Author

jenkins test windows

@mattbenjamin
Copy link
Copy Markdown
Contributor Author

mattbenjamin commented Dec 31, 2023

something broke the windows build mid-morning. it's complaining about an invalid boost submodule hash. this PR doesn't touch boost.

@mattbenjamin mattbenjamin force-pushed the wip-accept-new-awssigv4 branch from ad18207 to b35b925 Compare December 31, 2023 16:50
Comment on lines +380 to +374
inline void put_prop(const std::string_view k, const std::string_view v) {
/* assume the caller will mangle the key name, if required */
auto& map = const_cast<env_map_t&>(s->info.env->get_map());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i tend to view const_cast as a code smell, since it's usually just a shortcut to avoid propagating const correctly. it's necessary here because a) we only get access to const req_state* in the constructor, and b) RGWEnv only exposes a const-qualified get_map()

since put_prop() is only called during AWSv4ComplMulti::complete() now, could we extend the Completer interface to take a non-const RGWEnv& env there so we can call env.set(k, v) instead?

that way we preserve the original intent for the Completer to have const-only access to the input req_state, while
the complete(RGWEnv& env) signature hopefully makes it clear that it may add trailing headers

Copy link
Copy Markdown
Contributor Author

@mattbenjamin mattbenjamin Dec 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current dance around hiding things from Completer is kind of daft, to be honest. I'd prefer not to make further backflips here. Notice that the code I submitted does not widen Completer at all, only AWSv4ComplMulti, which has a reason to modify the req_state.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(at the same time, there's clearly no good justification for unwinding the the haskellish style of this entire subsystem--that said, what I'm trying to say is, yes, there's a clear inconsistency--but having a special call to modify the req_state is the code smell, imo.)

@mattbenjamin mattbenjamin force-pushed the wip-accept-new-awssigv4 branch from b35b925 to 2c0539c Compare December 31, 2023 20:16
@mattbenjamin
Copy link
Copy Markdown
Contributor Author

mattbenjamin commented Dec 31, 2023

I added logic to throw an exception if we would exceed trailer_buf_size-1 bytes of trailer, using ERR_LIMIT_EXCEEDED as the error code

@mattbenjamin
Copy link
Copy Markdown
Contributor Author

jenkins test windows

@github-actions
Copy link
Copy Markdown

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@cbodley
Copy link
Copy Markdown
Contributor

cbodley commented Feb 11, 2024

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

apologies, i had forgotten about these conflicts when i merged #55250

These changes address checksum header identification and signing
algorithm selection, including checksum trailer verification
for signed- and unsigned-payload cases.

These changes address all the actual S3 request failures I have
so far been able to reproduce, with and without content checksums
and/or new trailing checksum headers, and with and without
SSL.

Fixes: https://tracker.ceph.com/issues/63153

Specifically, it fixes the request failures that motivated the
initial tracker filing.  It extracts but does not validate new client
content checksums if present.  Validation and management of new
S3 content-checksum headers will follow in a subsequent change.

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>

squashed commits:

* wip chunk meta parsing--seem to have first AWSv4ComplMulti::ChunkMeta::create_next sort of parsing
* use constexpr sarlen(...) for static array lengths throughout rgw_auth_s3.cc
* link AWSv4CompleMulti::ChunkMeta to its enclosing completer
* capture original content-length header before AWSv4ComplMulti overwrites it
* mostly extract the trailer
* fix misordered content-length, experiment w/exbuf
* save leftover bytes between calls to AWSv4ComplMulti::recv_chunk()
* propagate data_offset_in_stream from AWSv4ComplMulti::recv_chunk()
* clean up trailer section extract
* trailer section cleanup and introduce extract_helper
* unrolled checksum extract--fixup
* fix sv_trailer end pos, and cleanup
* add proplist interface to rgw::auth::Completer and AWSv4ComplMulti
* spliterate trailers
* check completer props
* redefine prop_map to point into already-allocated trailer_vec
* hax: thread a counter onto AWSv4ComplMulti recv_body() and recv_chunk path
* fix apparent bug where due to reads less than chunk_size induce a final, zero-length read that was skipped before forcing recognition of the last chunk in the stream
* check only for a trailing checksum named in x-amz-trailer
* don't try to match signatures when no signature provided (because streaming unsigned)
* oops, fix content_length decl
* fix recognition of next chunk envelope in unsigned aws-chunk case
* clean up AWSv4CompMulti flags and correctly detect aws unsigned chunked
* rework checksum-trailer extraction and introduce AWSv4ComplMulti::calc_v4_trailing_signature
* thread const struct req_state* into AWSv4ComplMulti
* large cleanup of trailer parsing, no regression
* fix trailer signature calculation--checks
* correctly generate final chunk hmac
* typo in comment
* verify trailing signature when expected (using expected final chunk signature)
* move trailer_vec back onto recv_body()'s stack
* remove strange completer comment
* remove last_frag (now points into parsing_buf)
* remove implied dependency on content_length
* move trailer recognition to AWSv4ComplMulti::complete()
* remove now-unused is_last_chunk() predicate
* remove unused ChunkMeta::completer
* responses to review comments
* when trailer is sig expected, fail (only) if none present or if it does not match calculated
* remove stale parse_content_length(...) decl
* remove now-unused AWSv4ComplMulti::content_length
* fix extract_helper end search position as in mut_extract_helper
* change "\n" reserve term in get_canon_amz_hdrs() part of the sum (review)
    and initialize length to 0
* remove debugging code

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
This can be run by hand with:

./mvnw clean package
./mvnw test -Dtest=PutObjects

The following properties are sourced from the environment:

    AWS_ACCESS_KEY_ID
    AWS_SECRET_ACCESS_KEY

    RGW_HTTP_ENDPOINT_URL
    RGW_HTTPS_ENDPOINT_URL

Then adds:

qa/workunits/rgw: add test driver script for maven suite

Launch it fromn cls.yaml, as with test_librgw_file.h.

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
1. correctly match signature of 0-length chunk
2. initialize lf_bytes

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
The junit5 suite in fact chooses selects transport security (SSL)
strictly from the endpoint URL.  The test_awssdkv4_sig.sh (or its
caller?) only needs to export RGW_HTTP_ENDPOINT_URL appropriately
to get one or the other.

Fix several mistakes in refactoring caught by Ali Maredia.
Print AccessKey, SecretKey and EndpointURL on startup

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
mattbenjamin and others added 3 commits February 14, 2024 12:02
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Signed-off-by: Ali Maredia <amaredia@redhat.com>
Signed-off-by: Ali Maredia <amaredia@redhat.com>
Copy link
Copy Markdown
Contributor

@cbodley cbodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great if it's passing tests

@cbodley
Copy link
Copy Markdown
Contributor

cbodley commented Feb 15, 2024

The following tests FAILED:
235 - unittest_posix_bucket_cache (Failed)

@cbodley
Copy link
Copy Markdown
Contributor

cbodley commented Feb 15, 2024

jenkins test make check

@cbodley
Copy link
Copy Markdown
Contributor

cbodley commented Feb 15, 2024

The following tests FAILED:
34 - run-rbd-unit-tests-127.sh (Timeout)
198 - unittest_osdmap (Subprocess aborted)

@cbodley
Copy link
Copy Markdown
Contributor

cbodley commented Feb 15, 2024

jenkins test make check

@cbodley
Copy link
Copy Markdown
Contributor

cbodley commented Feb 16, 2024

passed qa in https://pulpito.ceph.com/cbodley-2024-02-15_03:24:02-rgw-wip-accept-new-awssigv4-distro-default-smithi/

verified that java test are running in teuthology.log:

Tests run: 33, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 8.619 s -- in io.ceph.jcksum.PutObjects

@WGH-
Copy link
Copy Markdown

WGH- commented Jul 9, 2024

For cross-reference, this likely fixes an issue in some Go S3 client: minio/minio-go#1235. That one is about 0-byte chunk-signature for empty object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants