Skip to content

rgw: prevent spurious/lost notifications in the index completion thread#45212

Merged
cbodley merged 2 commits intoceph:masterfrom
cbodley:wip-54435
Mar 8, 2022
Merged

rgw: prevent spurious/lost notifications in the index completion thread#45212
cbodley merged 2 commits intoceph:masterfrom
cbodley:wip-54435

Conversation

@cbodley
Copy link
Contributor

@cbodley cbodley commented Mar 1, 2022

cbodley: cherry-picked from the wip-rgw-multisite-reshard feature branch after merge in #45131, so it can be backported to earlier releases. removed logging changes in guard_reshard() due to conflicts with commit 85848c0 that's specific to multisite reshard

Fixes: https://tracker.ceph.com/issues/54435

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

this was happening when asyn completions happened during reshard.
more information about testing:
https://gist.github.com/yuvalif/d526c0a3a4c5b245b9e951a6c5a10517

we also add more logs to the completion manager.
should allow finding unhandled completions due to reshards.

Fixes: https://tracker.ceph.com/issues/54435

Signed-off-by: Yuval Lifshitz <ylifshit@redhat.com>
@ljflores
Copy link
Member

ljflores commented Mar 5, 2022

jenkins test make check

@cbodley
Copy link
Contributor Author

cbodley commented Mar 7, 2022

@yuvalif seeing some UninitCondition and SyscallParam valgrind errors from this one in https://pulpito.ceph.com/cbodley-2022-03-04_14:14:50-rgw-wip-cbodley-testing-distro-default-smithi/

they're all under RGWIndexCompletionManager::process(), and mostly relate to its mutex and condition variable

Comment on lines +814 to +816
std::thread retry_thread;
std::condition_variable cond;
std::mutex retry_completions_lock;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it may just be that retry_thread's constructor is being called before its cond and retry_completions_lock are?

resolves valgrind issues about RGWIndexCompletionManager::process()
using uninitialized memory

Signed-off-by: Casey Bodley <cbodley@redhat.com>
@cbodley
Copy link
Contributor Author

cbodley commented Mar 8, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants