Skip to content

tentacle: rgw/admin: Fix assert on datalog list of invalid shard#64898

Merged
SrinivasaBharath merged 9 commits intoceph:tentaclefrom
adamemerson:wip-70882-tentacle
Aug 28, 2025
Merged

tentacle: rgw/admin: Fix assert on datalog list of invalid shard#64898
SrinivasaBharath merged 9 commits intoceph:tentaclefrom
adamemerson:wip-70882-tentacle

Conversation

@adamemerson
Copy link
Contributor

@adamemerson adamemerson commented Aug 7, 2025

Original tracker: https://tracker.ceph.com/issues/70882

Fixes: https://tracker.ceph.com/issues/72483

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

Fixes: https://tracker.ceph.com/issues/70882
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit b84d7e3)
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
I had been thinking of list and trim as purely internal interfaces,
but they are called through HTTP and thus need to be prepared for bad
input.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 9d4968b)
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Didn't include `associated_cancellation_slot.hpp`.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 26ef486)
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Asio does not have nearly as many actual explicit concepts one can use
as one might like.

And there's no reason we might not want our own asynchrony-related concepts.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 07c77b0)
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Reimplement with `initiate` rather than the old style. This
necessitates getting rid of the old `async::Completion` in anything
that was calling it, and other changes.

Also, use disposition for error handling.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit e81d4ea)

Conflicts:
	src/rgw/driver/rados/rgw_datalog.cc
 - This commit was on top of another bugfix with changes to
   startup/shutdown that has not yet merged.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
A convenience function for turning coroutines that return values and
use exceptions, `error_code`, or similar into `int`-returning
functions that take references to out parameters.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 9f17b63)
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
This avoids having two entry points with different error checking
preparation, etc. to get out of sync or have a fix get forgotten.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 4732944)
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Easier to debug.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit bb93aa3)

Conflicts:
	src/rgw/driver/rados/rgw_datalog.cc
	src/rgw/driver/rados/rgw_datalog.h
 - Commit was made against the fix for 71066, whose backport is still
   awaiting merge.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Copy link
Contributor

@idryomov idryomov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holding this backport until https://tracker.ceph.com/issues/72692 is addressed.

@idryomov idryomov added the DNM label Aug 21, 2025
@idryomov
Copy link
Contributor

@adamemerson Could you please cherry pick bd449f0 into this PR?

dispatch() is allowed to invoke the completion object in the current
thread, before control returns from dispatch().  This isn't desirable
when it comes to discarding version requests in MonClient::shutdown()
and MonClient::_reopen_session() because completion objects could then
be invoked under monc_lock.  In case of MonClient::_reopen_session() in
particular, this leads to an attempt to acquire monc_lock once again in
MonClient::get_version() on a retry due to monc_errc::session_reset
that is converted to errc::resource_unavailable_try_again:

  MonClient::ms_handle_reset
    < takes monc_lock >
    MonClient::_reopen_session
      < invokes the completion object via dispatch() with ec == monc_errc::session_reset >
      Objecter::CB_Objecter_GetVersion::operator() [ ec == errc::resource_unavailable_try_again ]
        Objecter::_wait_for_latest_osdmap
          MonClient::get_version
            < attempts to take monc_lock in the body of the lambda >

The end result is either a lockup or some form of undefined behavior.
The best possible outcome here is an exception (std::system_error with
"Resource deadlock avoided" error) and a successive call to
std::terminate().

This is a regression introduced in commit e81d4ea ("common/async:
Update `use_blocked` for newer asio").  Revert to posting version
request completions for the error cases in a way that is uniform with
the success case in MonClient::handle_get_version_reply().

Fixes: https://tracker.ceph.com/issues/72692
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit bd449f0)
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
@adamemerson
Copy link
Contributor Author

@adamemerson Could you please cherry pick bd449f0 into this PR?

Done, thank you.

@idryomov
Copy link
Contributor

jenkins test make check

@idryomov idryomov removed the DNM label Aug 25, 2025
@sseshasa
Copy link
Contributor

@SrinivasaBharath SrinivasaBharath merged commit f239f0b into ceph:tentacle Aug 28, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants