Skip to content

librados: aio_unwatch() delivers ENOTCONN to AioCompletion#66610

Merged
idryomov merged 5 commits intoceph:mainfrom
cbodley:wip-librados-aio-unwatch-enotconn
Dec 21, 2025
Merged

librados: aio_unwatch() delivers ENOTCONN to AioCompletion#66610
idryomov merged 5 commits intoceph:mainfrom
cbodley:wip-librados-aio-unwatch-enotconn

Conversation

@cbodley
Copy link
Contributor

@cbodley cbodley commented Dec 11, 2025

94f42b6 added a new error condition to IoCtx::aio_unwatch() that callers aren't prepared to handle. instead of returning that error directly, report it asynchronously to the AioCompletion

Show available Jenkins commands

You must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.

@cbodley
Copy link
Contributor Author

cbodley commented Dec 11, 2025

i couldn't find any existing examples of aio_* functions reporting immediate errors this way, but i think the logic is correct:

  • if there's a callback attached, we'll call it
  • if the caller uses c->wait_for_complete(), setting c->complete = true will cause that to complete immediately without requiring us to notify the condition variable

@adamemerson
Copy link
Contributor

i couldn't find any existing examples of aio_* functions reporting immediate errors this way, but i think the logic is correct:

I use this helper class to keep track of the state machine and not lose stuff:

template<typename T>

And signal the AioCompletion in an initiating function here:

Pusher::complete(std::move(p), -E2BIG);

Copy link
Contributor

@idryomov idryomov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM. I did some limited testing with the RBD suite and the sporadic crashes noted in #65698 (comment) are gone.

@cbodley cbodley force-pushed the wip-librados-aio-unwatch-enotconn branch from bcc072c to 8fe58c3 Compare December 12, 2025 16:04
@idryomov
Copy link
Contributor

I did some limited testing with the RBD suite and the sporadic crashes noted in #65698 (comment) are gone.

The full RBD run looks good too: #66581 (comment)

@cbodley
Copy link
Contributor Author

cbodley commented Dec 15, 2025

https://pulpito.ceph.com/cbodley-2025-12-12_17:58:41-rgw-wip-librados-aio-unwatch-enotconn-distro-default-gibba/

i can't view the results due to the lab move, but this shouldn't block merge given that rbd qa has passed

@cbodley
Copy link
Contributor Author

cbodley commented Dec 15, 2025

jenkins test api

94f42b6 added a new error condition to
IoCtx::aio_unwatch() that callers aren't prepared to handle. instead of
returning that error directly, report it asynchronously to the
AioCompletion

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
before this change set, linger_register() returned a raw LingerOp
pointer with an implicit reference for the caller. for librados,
this implicit reference is only dropped when the corresponding
unwatch() calls linger_cancel()

after commit 94f42b6 introduced
linger_by_cookie(), unwatch() no longer has a safe way to drop this
implicit reference. to prevent LingerOp leaks when unwatch() returns
ENOTCONN, we can't hold this implicit reference count until unwatch()

linger_register() now returns an explicit reference to the caller as
intrusive_ptr<LingerOp>. this helps to guarantee that this reference
count gets dropped before the completion of watch()/aio_watch()

because linger_register() no longer acquires an implicit reference for
the caller, linger_cancel() no longer drops it with info->put()

Reported-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
@cbodley cbodley force-pushed the wip-librados-aio-unwatch-enotconn branch from dc59da5 to 1b0f873 Compare December 16, 2025 15:55
@idryomov
Copy link
Contributor

jenkins test windows

1 similar comment
@idryomov
Copy link
Contributor

jenkins test windows

@idryomov idryomov merged commit 08282e4 into ceph:main Dec 21, 2025
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants