Skip to content

rgw/notifications: replace timer waiter with async waiter#63986

Merged
yuvalif merged 2 commits intoceph:mainfrom
yuvalif:wip-yuval-71390
Jul 7, 2025
Merged

rgw/notifications: replace timer waiter with async waiter#63986
yuvalif merged 2 commits intoceph:mainfrom
yuvalif:wip-yuval-71390

Conversation

@yuvalif
Copy link
Contributor

@yuvalif yuvalif commented Jun 17, 2025

this should allow for proper shutdown of the queue handling code of persistent notifications.

Fixes: https://tracker.ceph.com/issues/71390

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

@github-actions github-actions bot added the rgw label Jun 17, 2025
@yuvalif
Copy link
Contributor Author

yuvalif commented Jun 17, 2025

@yuvalif
Copy link
Contributor Author

yuvalif commented Jun 17, 2025

rerun: https://pulpito.ceph.com/yuvalif-2025-06-17_16:42:46-rgw:notifications-wip-yuval-71390-distro-default-smithi/
is showing valgrind failures and known failure with multiple kafka brokers

@yuvalif
Copy link
Contributor Author

yuvalif commented Jun 23, 2025

jenkins test api

@yuvalif yuvalif requested review from adamemerson and cbodley June 23, 2025 09:09
@yuvalif yuvalif marked this pull request as ready for review June 23, 2025 09:09
@yuvalif yuvalif requested a review from a team as a code owner June 23, 2025 09:09
@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@cbodley
Copy link
Contributor

cbodley commented Jul 2, 2025

sorry, it's hard for me to tell by reading the diff - if we're reducing the duration of sleeps, does that also mean the idle behavior is polling more aggressively? i think you mentioned in slack that was reduced from 30s to 2s

@yuvalif
Copy link
Contributor Author

yuvalif commented Jul 2, 2025

sorry, it's hard for me to tell by reading the diff - if we're reducing the duration of sleeps, does that also mean the idle behavior is polling more aggressively? i think you mentioned in slack that was reduced from 30s to 2s

i changed the sleep to 1s. but the probing time is the same. e.g.
https://github.com/ceph/ceph/pull/63986/files#diff-e05836df60adf79bd9d629146b894bcd2ac828ea23a63c324e87b20cf7650d72R288-R294

if not enough time passed, i'm just getting to the 1s sleep again. there will be some CPU increase, but should be minor, and should be no IO increase

@cbodley
Copy link
Contributor

cbodley commented Jul 2, 2025

if not enough time passed, i'm just getting to the 1s sleep again. there will be some CPU increase, but should be minor, and should be no IO increase

got it, thanks. waking up once a second to check for shutdown sounds fine 👍

eventually we can look at coroutine cancellation, but that would probably need larger design changes (coroutines that spawn other detached coroutines would need a way to forward cancellation signals)

Copy link
Contributor

@cbodley cbodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving again with shutdown changes

@yuvalif
Copy link
Contributor Author

yuvalif commented Jul 3, 2025

jenkins test api

@yuvalif
Copy link
Contributor Author

yuvalif commented Jul 3, 2025

jenkins test make check

@yuvalif
Copy link
Contributor Author

yuvalif commented Jul 3, 2025

jenkins test make check arm64

yuvalif added 2 commits July 3, 2025 15:46
this should allow for proper shutdown of the queue handling
code of persistent notifications.

Fixes: https://tracker.ceph.com/issues/71963

Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
* use short timers even for the longer timeouts
* allow graceful shutdown when stopping

Fixes: https://tracker.ceph.com/issues/71963

Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
@yuvalif yuvalif force-pushed the wip-yuval-71390 branch from 119771f to 433717a Compare July 3, 2025 15:47
@yuvalif
Copy link
Contributor Author

yuvalif commented Jul 7, 2025

jenkins test make check

@yuvalif
Copy link
Contributor Author

yuvalif commented Jul 7, 2025

notification tests are failing on expected issues: https://pulpito.ceph.com/yuvalif-2025-07-03_13:51:11-rgw:notifications-wip-yuval-71390-distro-default-smithi/

  • kafka failover
  • valgrind issue

regression: https://pulpito.ceph.com/yuvalif-2025-07-03_15:37:40-rgw-wip-yuval-71390-distro-default-smithi/
is failing with known issues:

  • kafka failover
  • valgrind issue in notifications
  • d4n
  • multisite
  • SSHException

@yuvalif yuvalif merged commit a217c0e into ceph:main Jul 7, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants