Skip to content

reef: librbd/migration/HttpClient: avoid reusing ssl_stream after shut down#61094

Merged
sunilangadi2 merged 10 commits intoceph:reeffrom
idryomov:wip-69178-reef
Jan 17, 2025
Merged

reef: librbd/migration/HttpClient: avoid reusing ssl_stream after shut down#61094
sunilangadi2 merged 10 commits intoceph:reeffrom
idryomov:wip-69178-reef

Conversation

@idryomov
Copy link
Contributor

Logging just the (negated) value makes tracking down the error category
and message unnecessarily hard.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit eb77349)
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit d1a83f2)
resolve_host() is called from init() and issue() when transitioning out
of STATE_UNINITIALIZED and from advance_state() right after the call to
shutdown_socket().  In all three cases the socket should get closed, so
drop the redundant call and place asserts in connect() implementations
instead.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 5699e4b)
ssl_stream objects can't be reused after shut down: despite
a successful reconnect and handshake, any attempt to read or write
fails with "end of stream" (beast.http:1) or "protocol is shutdown"
(asio.ssl:337690831) error respectively.  This doesn't appear to be
documented, but Beast and ASIO authors both mention that the stream
must be destroyed and recreated [1][2].

This was missed because the only integration test with a big enough
image used http instead of https.

[1] boostorg/beast#821 (comment)
[2] chriskohlhoff/asio#804 (comment)

Fixes: https://tracker.ceph.com/issues/69178
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 20885b1)
…ake()

With m_ssl_enabled set to false, disconnect() is a no-op.  Since
m_ssl_enabled is flipped to true only when the handshake succeeds,
calling disconnect() on "failed to complete handshake" error is bogus
(as would be attempting to shut down SSL there).

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 331b5ea)
The remaining callers of disconnect() call it only when m_ssl_enabled
is set to true (i.e. after the handshake is completed):

- shut_down(), in STATE_READY
- maybe_finalize_reset(), very shortly after transitioning out of
  STATE_READY as part of performing a reset
- advance_state(), on a transition to STATE_READY that is intercepted
  by a previously delayed shut down

m_ssl_enabled isn't used outside of disconnect() and on top of that
is never cleared.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 8566224)
Get rid of get_callback_adapter() which only obfuscates the error:

  handle_handshake: failed to complete SSL handshake: (337047686) Unknown error 337047686

vs

  handle_handshake: failed to complete SSL handshake: certificate verify failed (SSL routines, tls_process_server_certificate)

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit e305a59)
…wn SSL

Propagate ec to handle_disconnect() and use it to suppress
stream_truncated errors.  Here is a quote from Beast documentation [1]:

  // Gracefully shutdown the SSL/TLS connection
  error_code ec;
  stream.shutdown(ec);
  // Non-compliant servers don't participate in the SSL/TLS shutdown process and
  // close the underlying transport layer. This causes the shutdown operation to
  // complete with a `stream_truncated` error. One might decide not to log such
  // errors as there are many non-compliant servers in the wild.
  if(ec != net::ssl::error::stream_truncated)
      log(ec);

... and a commit that made ignoring stream_truncated safe [2]:

  // ssl::error::stream_truncated, also known as an SSL "short read",
  // indicates the peer closed the connection without performing the
  // required closing handshake
  // [...]
  // When a short read would cut off the end of an HTTP message,
  // Beast returns the error beast::http::error::partial_message.
  // Therefore, if we see a short read here, it has occurred
  // after the message has been completed, so it is safe to ignore it.

[1] https://www.boost.org/doc/libs/develop/libs/beast/doc/html/beast/using_io/ssl_tls_shutdown.html
[2] boostorg/beast@094f5ec

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 9fa0bcc)
If the shutdown gets delayed until the state transition from
STATE_RESET_CONNECTING completes and the reconnect is successful
(i.e. next_state is STATE_READY), we eventually hit "unexpected
state transition" assert in advance_state().  The reason is that
advance_state() would update m_state and call disconnect() under
STATE_READY instead of STATE_SHUTTING_DOWN.  After the disconnect
maybe_finalize_shutdown() would enter advance_state() again with
STATE_SHUTDOWN as next_state, but the transition to that from
STATE_READY is invalid.

Plug this by not transitioning to next_state if current_state is
STATE_SHUTTING_DOWN.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 1046d61)
…nsitions

If shut_down() gets delayed until a) the state transition from
STATE_RESET_CONNECTING completes and the reconnect is unsuccessful or
b) the state transition from STATE_RESET_DISCONNECTING completes (i.e.
next_state is STATE_UNINITIALIZED or STATE_RESET_CONNECTING), the
socket needs to be shut down before m_on_shutdown is invoked.  The line
of thought here is the same as for the corresponding state transitions
that don't involve STATE_SHUTTING_DOWN.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 88557df)
@idryomov idryomov added this to the v18.2.5 milestone Dec 16, 2024
@idryomov idryomov requested a review from a team as a code owner December 16, 2024 08:54
@sunilangadi2
Copy link

jenkins test make check

1 similar comment
@yuriw
Copy link
Contributor

yuriw commented Jan 8, 2025

jenkins test make check

@yuriw
Copy link
Contributor

yuriw commented Jan 10, 2025

This is ready for merge as soon as make check passed
ref: https://tracker.ceph.com/issues/69414

@yuriw
Copy link
Contributor

yuriw commented Jan 13, 2025

jenkins test make check

1 similar comment
@yuriw
Copy link
Contributor

yuriw commented Jan 16, 2025

jenkins test make check

@sunilangadi2 sunilangadi2 merged commit a914838 into ceph:reef Jan 17, 2025
@idryomov idryomov deleted the wip-69178-reef branch January 17, 2025 10:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants