pacific: do not evict clients if OSDs are laggy#52270
Conversation
Fixes: https://tracker.ceph.com/issues/58023 Signed-off-by: Dhairya Parmar <dparmar@redhat.com> (cherry picked from commit 95fbe30)
Fixes: https://tracker.ceph.com/issues/58023 Signed-off-by: Dhairya Parmar <dparmar@redhat.com> (pacific branch has no path src/common/options/mds.yaml.in but src/common/options.cc; because of this the content has been picked up from 22e4bcf and added accordingly to src/common/options.cc)
A client might get unresponsive/laggy due to laggy OSD(s). This change provides us a way to defer client eviction in such scenarios also adds helpers: - get_laggy_clients() - clear_laggy_clients() and call clear_laggy_clients() before calling related Server methods Fixes: https://tracker.ceph.com/issues/58023 Signed-off-by: Dhairya Parmar <dparmar@redhat.com> (cherry picked from commit 31a8d03)
using new MDS health metric Fixes: https://tracker.ceph.com/issues/58023 Signed-off-by: Dhairya Parmar <dparmar@redhat.com> (cherry picked from commit 5a0e8a7)
Signed-off-by: Dhairya Parmar <dparmar@redhat.com> (cherry picked from commit 51cca9b)
Signed-off-by: Dhairya Parmar <dparmar@redhat.com> (cherry picked from commit 833aa34)
Signed-off-by: Dhairya Parmar <dparmar@redhat.com> (cherry picked from commit 7c8e794)
8df59cb to
b37beeb
Compare
|
@dparmar18 The teuthology test https://pulpito.ceph.com/yuriw-2023-07-27_22:37:12-rados-wip-yuri6-testing-2023-07-24-0819-pacific-distro-default-smithi/7354866 failed due to I will remove the relevant test tag and 'needs-qa' label for now. Please add the 'needs-qa' label back once the PR is ready for a re-test. |
@dparmar18 Please note the above failure and see if it needs to be addressed. Once you confirm, this PR can be included in the next QA round. Thanks! |
Hey @sseshasa my bad, i totally lost your comment among other PRs. I think @ljflores had a PR that got merged for it #52342, seems like it needs to be backported |
|
jenkins test api |
|
You may in fact want to combine that commit with this PR since it will help the API check pass. If you decide to go that route, you can |
yeah makes sense, I'll cherry pick it here and close that one. Thanks @ljflores |
We expect laggy OSDs in this testing environment, so it makes sense to disable this warning. Fixes: https://tracker.ceph.com/issues/61907 Signed-off-by: Laura Flores <lflores@redhat.com> (cherry picked from commit 2322d2c) (cherry picked from commit 2032e8b)
In the future, update your PR description when you include more fixes. |
ACK |
backport tracker: https://tracker.ceph.com/issues/61841
parent tracker: https://tracker.ceph.com/issues/58023
backport of #49971
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows