Skip to content

librbd/migration/HttpClient: avoid reusing ssl_stream after shut down#61079

Merged
idryomov merged 10 commits intoceph:mainfrom
idryomov:wip-69178
Dec 15, 2024
Merged

librbd/migration/HttpClient: avoid reusing ssl_stream after shut down#61079
idryomov merged 10 commits intoceph:mainfrom
idryomov:wip-69178

Conversation

@idryomov
Copy link
Contributor

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

Logging just the (negated) value makes tracking down the error category
and message unnecessarily hard.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
resolve_host() is called from init() and issue() when transitioning out
of STATE_UNINITIALIZED and from advance_state() right after the call to
shutdown_socket().  In all three cases the socket should get closed, so
drop the redundant call and place asserts in connect() implementations
instead.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
ssl_stream objects can't be reused after shut down: despite
a successful reconnect and handshake, any attempt to read or write
fails with "end of stream" (beast.http:1) or "protocol is shutdown"
(asio.ssl:337690831) error respectively.  This doesn't appear to be
documented, but Beast and ASIO authors both mention that the stream
must be destroyed and recreated [1][2].

This was missed because the only integration test with a big enough
image used http instead of https.

[1] boostorg/beast#821 (comment)
[2] chriskohlhoff/asio#804 (comment)

Fixes: https://tracker.ceph.com/issues/69178
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
…ake()

With m_ssl_enabled set to false, disconnect() is a no-op.  Since
m_ssl_enabled is flipped to true only when the handshake succeeds,
calling disconnect() on "failed to complete handshake" error is bogus
(as would be attempting to shut down SSL there).

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
The remaining callers of disconnect() call it only when m_ssl_enabled
is set to true (i.e. after the handshake is completed):

- shut_down(), in STATE_READY
- maybe_finalize_reset(), very shortly after transitioning out of
  STATE_READY as part of performing a reset
- advance_state(), on a transition to STATE_READY that is intercepted
  by a previously delayed shut down

m_ssl_enabled isn't used outside of disconnect() and on top of that
is never cleared.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Get rid of get_callback_adapter() which only obfuscates the error:

  handle_handshake: failed to complete SSL handshake: (337047686) Unknown error 337047686

vs

  handle_handshake: failed to complete SSL handshake: certificate verify failed (SSL routines, tls_process_server_certificate)

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
…wn SSL

Propagate ec to handle_disconnect() and use it to suppress
stream_truncated errors.  Here is a quote from Beast documentation [1]:

  // Gracefully shutdown the SSL/TLS connection
  error_code ec;
  stream.shutdown(ec);
  // Non-compliant servers don't participate in the SSL/TLS shutdown process and
  // close the underlying transport layer. This causes the shutdown operation to
  // complete with a `stream_truncated` error. One might decide not to log such
  // errors as there are many non-compliant servers in the wild.
  if(ec != net::ssl::error::stream_truncated)
      log(ec);

... and a commit that made ignoring stream_truncated safe [2]:

  // ssl::error::stream_truncated, also known as an SSL "short read",
  // indicates the peer closed the connection without performing the
  // required closing handshake
  // [...]
  // When a short read would cut off the end of an HTTP message,
  // Beast returns the error beast::http::error::partial_message.
  // Therefore, if we see a short read here, it has occurred
  // after the message has been completed, so it is safe to ignore it.

[1] https://www.boost.org/doc/libs/develop/libs/beast/doc/html/beast/using_io/ssl_tls_shutdown.html
[2] boostorg/beast@094f5ec

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
If the shutdown gets delayed until the state transition from
STATE_RESET_CONNECTING completes and the reconnect is successful
(i.e. next_state is STATE_READY), we eventually hit "unexpected
state transition" assert in advance_state().  The reason is that
advance_state() would update m_state and call disconnect() under
STATE_READY instead of STATE_SHUTTING_DOWN.  After the disconnect
maybe_finalize_shutdown() would enter advance_state() again with
STATE_SHUTDOWN as next_state, but the transition to that from
STATE_READY is invalid.

Plug this by not transitioning to next_state if current_state is
STATE_SHUTTING_DOWN.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
…nsitions

If shut_down() gets delayed until a) the state transition from
STATE_RESET_CONNECTING completes and the reconnect is unsuccessful or
b) the state transition from STATE_RESET_DISCONNECTING completes (i.e.
next_state is STATE_UNINITIALIZED or STATE_RESET_CONNECTING), the
socket needs to be shut down before m_on_shutdown is invoked.  The line
of thought here is the same as for the corresponding state transitions
that don't involve STATE_SHUTTING_DOWN.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
@idryomov
Copy link
Contributor Author

@idryomov
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants