Skip to content

pybind/mgr/volumes: avoid deadlock in ceph-mgr Finisher thread#40316

Merged
tchaikov merged 4 commits intoceph:masterfrom
batrick:i49605
Apr 3, 2021
Merged

pybind/mgr/volumes: avoid deadlock in ceph-mgr Finisher thread#40316
tchaikov merged 4 commits intoceph:masterfrom
batrick:i49605

Conversation

@batrick
Copy link
Member

@batrick batrick commented Mar 22, 2021

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@batrick batrick added cephfs Ceph File System DNM labels Mar 22, 2021
@batrick batrick force-pushed the i49605 branch 5 times, most recently from d20babf to 22e4c98 Compare March 23, 2021 01:50
@batrick batrick force-pushed the i49605 branch 2 times, most recently from df3ea3c to 4b5dc62 Compare March 23, 2021 03:18
@batrick batrick force-pushed the i49605 branch 2 times, most recently from 10c9484 to 9cab094 Compare March 24, 2021 01:25
batrick added 3 commits March 24, 2021 11:37
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Hunting [1].

[1] https://tracker.ceph.com/issues/49605
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Perform thread count updates in a dedicated tick thread. This avoids the
mgr Finisher thread from getting potentially hung via a mutex deadlock
in the cloner thread management.

Fixes: https://tracker.ceph.com/issues/49605
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
@batrick batrick marked this pull request as ready for review March 24, 2021 18:52
@batrick batrick requested review from kotreshhr and vshankar March 24, 2021 18:52
@batrick batrick removed the DNM label Mar 24, 2021
@vshankar
Copy link
Contributor

@batrick changes look good. hoping to catch potential locking issues...

@batrick
Copy link
Member Author

batrick commented Mar 26, 2021

https://pulpito.ceph.com/pdonnell-2021-03-24_23:26:35-fs-wip-pdonnell-testing-20210324.190252-distro-basic-smithi/

@batrick
Copy link
Member Author

batrick commented Mar 26, 2021

Ready to merge.

There is a hang in get_job which is holding the mutex [1]. This debug
output is meant to help find this issue in upstream QA logs.

[1] https://tracker.ceph.com/issues/49605#note-5
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
@batrick
Copy link
Member Author

batrick commented Mar 30, 2021

Thanks @tchaikov , I've adopted most of your suggestions.

@kotreshhr
Copy link
Contributor

jenkins test api

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants