Skip to content

osdc: Objecter::linger_by_cookie() for safe cast from uint64#65698

Merged
ljflores merged 2 commits intoceph:mainfrom
cbodley:wip-72771
Nov 21, 2025
Merged

osdc: Objecter::linger_by_cookie() for safe cast from uint64#65698
ljflores merged 2 commits intoceph:mainfrom
cbodley:wip-72771

Conversation

@cbodley
Copy link
Contributor

@cbodley cbodley commented Sep 26, 2025

a linger_ops_set was added for Objecter::handle_watch_notify() as a safety check before casting uint64_t cookie to LingerOp* and deferencing it

neorados also made use of this set through Objecter::is_valid_watch() checks. however, this approach was still susceptible to use-after-free, because the callers didn't preserve a LingerOp reference between this check and its use - and the Objecter lock is dropped in between. in addition, neorados::RADOS::unwatch_() was missing its check for is_valid_watch()

librados did not make use of this is_valid_watch() at all, so was casting cookies directly to LingerOp* and dereferencing. this results in use-after-free for any cookies invalidated by linger_cancel() - for example when called by CB_DoWatchError

replace is_valid_watch() with a linger_by_cookie() function that

  • performs the validity check with linger_ops_set,
  • safely reinterpret_casts the cookie to LingerOp*, and
  • returns a reference to the caller via intrusive_ptr<LingerOp>

librados::IoCtxImpl::watch_check(), unwatch() and aio_unwatch() now call linger_by_cookie(), so have to handle the null case by returning -ENOTCONN (this matches neorados' existing behavior)

Fixes: https://tracker.ceph.com/issues/72771

Show available Jenkins commands

You must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.

preserve a reference to LingerOp in case their invocation races with
another linger_cancel()

Signed-off-by: Casey Bodley <cbodley@redhat.com>
a `linger_ops_set` was added for `Objecter::handle_watch_notify()`
as a safety check before casting `uint64_t cookie` to `LingerOp*`
and deferencing it

neorados also made use of this set through `Objecter::is_valid_watch()`
checks. however, this approach was still susceptible to use-after-free,
because the callers didn't preserve a LingerOp reference between this
check and its use - and the Objecter lock is dropped in between. in
addition, `neorados::RADOS::unwatch_()` was missing its check for
`is_valid_watch()`

librados did not make use of this `is_valid_watch()` at all, so was
casting cookies directly to LingerOp* and dereferencing. this results
in use-after-free for any cookies invalidated by `linger_cancel()` -
for example when called by `CB_DoWatchError`

replace `is_valid_watch()` with a `linger_by_cookie()` function that
* performs the validity check with `linger_ops_set`,
* safely reinterpret_casts the cookie to LingerOp*, and
* returns a reference to the caller via intrusive_ptr<LingerOp>

`librados::IoCtxImpl::watch_check()`, `unwatch()` and `aio_unwatch()`
now call `linger_by_cookie()`, so have to handle the null case by
returning `-ENOTCONN` (this matches neorados' existing behavior)

Fixes: https://tracker.ceph.com/issues/72771

Signed-off-by: Casey Bodley <cbodley@redhat.com>
@github-actions github-actions bot added the core label Sep 26, 2025
@adamemerson adamemerson self-assigned this Sep 26, 2025
@phlogistonjohn
Copy link
Contributor

jenkins test container make check

@cbodley cbodley marked this pull request as ready for review September 29, 2025 16:57
@cbodley cbodley requested a review from a team as a code owner September 29, 2025 16:57
@cbodley
Copy link
Contributor Author

cbodley commented Oct 2, 2025

librados did not make use of this is_valid_watch() at all, so was casting cookies directly to LingerOp* and dereferencing. this results in use-after-free for any cookies invalidated by linger_cancel() - for example when called by CB_DoWatchError

i was mistaken about this part, as CB_DoWatchError doesn't call linger_cancel(). the valgrind report in https://tracker.ceph.com/issues/72771#note-2 shows ~CB_DoWatchError dropping the last ref, but it isn't clear what if anything had called linger_cancel() before

@JonBailey1993
Copy link
Contributor

@ljflores ljflores merged commit 0908b59 into ceph:main Nov 21, 2025
17 of 20 checks passed
Objecter::LingerOp *linger_op = reinterpret_cast<Objecter::LingerOp*>(cookie);
boost::intrusive_ptr linger_op = objecter->linger_by_cookie(cookie);
if (!linger_op) {
return -ENOTCONN;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @cbodley,

This introduced a regression in RBD: librbd asserts that IoCtx::aio_watch() returns 0 because that has been the expected behavior for years meaning that we are experiencing sporadic crashes now. If the linger op is canceled, we expect to get ENOTCONN via the callback, not from the initiating function. Having the initiating function fail this way is prone to inconsistent behavior: in the scenario where the removal of the object on which the watch is established races with a call to IoCtx::aio_watch(), one could get the error from either the initiating function or the callback depending on the outcome of the race. Writing code such that it's prepared to handle both cases is suboptimal and also isn't aligned with neorados which always delivers the error via the callback.

librados did not make use of this is_valid_watch() at all, so was casting cookies directly to LingerOp* and dereferencing. this results in use-after-free for any cookies invalidated by linger_cancel() - for example when called by CB_DoWatchError

I'm not convinced there was a potential for a use-after-free in IoCtx::unwatch() or IoCtx::aio_watch() for a cookie that had gotten invalidated in response to a watch error -- assuming the caller passed a cookie previously obtained from IoCtx::watch(), of course. My understanding has been that Objecter::linger_register() returns a linger op with two references, one of which is intended for the caller of IoCtx::watch() (i.e. the user of the public API). That reference remains alive until the user performs the unwatch. A linger op that is canceled just gets marked as such (LingerOp::canceled bool) and loses the internal reference -- nothing should happen to the external reference. If that is correct, an extra validity check isn't strictly required but just a nice to have to catch totally bogus cookies. Am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced there was a potential for a use-after-free

sorry for the confusion, i was mistaken about that comment and tried to clarify in #65698 (comment)

If that is correct, an extra validity check isn't strictly required but just a nice to have to catch totally bogus cookies.

this includes cookies that the client already unwatched, so ioctx.unwatch(cookie); ioctx.unwatch(cookie); can lead to undefined behavior

in rgw, our librados::WatchCtx2::handle_error() override schedules calls to reinitialize the watch with ioctx.unwatch() -> ioctx.watch(). it doesn't look like anything prevents that unwatch from racing with our shutdown code that also calls unwatch, so there are still bugs to fix here

regardless, i'd argue that correct use of watch/notify is subtle enough that it's worth detecting these cases in librados to avoid dereferencing invalid memory

apologies for the regression. if you find my argument convincing, i can address the error handling issue in aio_unwatch()

Copy link
Contributor

@idryomov idryomov Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I don't see anything wrong with protecting against duplicate unwatch calls or similar mishaps. I just wanted to ensure that my understanding of the existing code was correct.

From the RBD POV, what we need to move past this is IoCtx::aio_watch() delivering this early ENOTCONN via the callback, just like its neorados counterpart. Is that what you have in mind?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the RBD POV, what we need to move past this is IoCtx::aio_watch() delivering this early ENOTCONN via the callback, just like its neorados counterpart. Is that what you have in mind?

👍 on it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants