Skip to content

pacific: do not evict clients if OSDs are laggy#52270

Merged
yuriw merged 8 commits intoceph:pacificfrom
dparmar18:wip-61841-pacific
Aug 21, 2023
Merged

pacific: do not evict clients if OSDs are laggy#52270
yuriw merged 8 commits intoceph:pacificfrom
dparmar18:wip-61841-pacific

Conversation

@dparmar18
Copy link
Contributor

backport tracker: https://tracker.ceph.com/issues/61841
parent tracker: https://tracker.ceph.com/issues/58023
backport of #49971

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

Fixes: https://tracker.ceph.com/issues/58023
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
(cherry picked from commit 95fbe30)
Fixes: https://tracker.ceph.com/issues/58023
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
(pacific branch has no path src/common/options/mds.yaml.in but
src/common/options.cc; because of this the content has been picked up from
22e4bcf and added accordingly to
src/common/options.cc)
@dparmar18 dparmar18 requested review from a team as code owners June 30, 2023 11:03
@github-actions github-actions bot added this to the pacific milestone Jun 30, 2023
A client might get unresponsive/laggy due to laggy OSD(s).
This change provides us a way to defer client eviction in
such scenarios

also adds helpers:
- get_laggy_clients()
- clear_laggy_clients()

and call clear_laggy_clients() before calling related
Server methods

Fixes: https://tracker.ceph.com/issues/58023
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
(cherry picked from commit 31a8d03)
using new MDS health metric

Fixes: https://tracker.ceph.com/issues/58023
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
(cherry picked from commit 5a0e8a7)
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
(cherry picked from commit 51cca9b)
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
(cherry picked from commit 833aa34)
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
(cherry picked from commit 7c8e794)
@dparmar18 dparmar18 force-pushed the wip-61841-pacific branch from 8df59cb to b37beeb Compare June 30, 2023 11:06
@dparmar18 dparmar18 requested a review from a team June 30, 2023 11:07
Copy link
Member

@lxbsz lxbsz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@sseshasa
Copy link
Contributor

sseshasa commented Aug 1, 2023

@dparmar18 The teuthology test https://pulpito.ceph.com/yuriw-2023-07-27_22:37:12-rados-wip-yuri6-testing-2023-07-24-0819-pacific-distro-default-smithi/7354866 failed due to
the MDS_CLIENTS_LAGGY warning. Should this test add the warning to the ignore list? If so, the PR may need additional changes. Please take a look.

I will remove the relevant test tag and 'needs-qa' label for now. Please add the 'needs-qa' label back once the PR is ready for a re-test.

@sseshasa
Copy link
Contributor

sseshasa commented Aug 3, 2023

@dparmar18 The teuthology test https://pulpito.ceph.com/yuriw-2023-07-27_22:37:12-rados-wip-yuri6-testing-2023-07-24-0819-pacific-distro-default-smithi/7354866 failed due to the MDS_CLIENTS_LAGGY warning. Should this test add the warning to the ignore list? If so, the PR may need additional changes. Please take a look.

I will remove the relevant test tag and 'needs-qa' label for now. Please add the 'needs-qa' label back once the PR is ready for a re-test.

@dparmar18 Please note the above failure and see if it needs to be addressed. Once you confirm, this PR can be included in the next QA round. Thanks!

@dparmar18
Copy link
Contributor Author

dparmar18 commented Aug 3, 2023

@dparmar18 The teuthology test https://pulpito.ceph.com/yuriw-2023-07-27_22:37:12-rados-wip-yuri6-testing-2023-07-24-0819-pacific-distro-default-smithi/7354866 failed due to the MDS_CLIENTS_LAGGY warning. Should this test add the warning to the ignore list? If so, the PR may need additional changes. Please take a look.
I will remove the relevant test tag and 'needs-qa' label for now. Please add the 'needs-qa' label back once the PR is ready for a re-test.

@dparmar18 Please note the above failure and see if it needs to be addressed. Once you confirm, this PR can be included in the next QA round. Thanks!

Hey @sseshasa my bad, i totally lost your comment among other PRs. I think @ljflores had a PR that got merged for it #52342, seems like it needs to be backported

@sseshasa
Copy link
Contributor

sseshasa commented Aug 3, 2023

Hey @sseshasa my bad, i totally lost your comment among other PRs. I think @ljflores had a PR that got merged for it #52342, seems like it needs to be backported
@dparmar18 Thanks! @ljflores We can include the backport of #52342 in the batch and run the QA batch.

@vshankar
Copy link
Contributor

jenkins test api

@ljflores
Copy link
Member

ljflores commented Aug 15, 2023

@dparmar18 #53007

You may in fact want to combine that commit with this PR since it will help the API check pass. If you decide to go that route, you can cherry-pick -x my commit on this PR and close the backport PR.

@dparmar18
Copy link
Contributor Author

@dparmar18 #53007

You may in fact want to combine that commit with this PR since it will help the API check pass. If you decide to go that route, you can cherry-pick -x my commit on this PR and close the backport PR.

yeah makes sense, I'll cherry pick it here and close that one. Thanks @ljflores

We expect laggy OSDs in this testing environment,
so it makes sense to disable this warning.

Fixes: https://tracker.ceph.com/issues/61907
Signed-off-by: Laura Flores <lflores@redhat.com>
(cherry picked from commit 2322d2c)
(cherry picked from commit 2032e8b)
@ljflores
Copy link
Member

@batrick
Copy link
Member

batrick commented Aug 21, 2023

@dparmar18 #53007
You may in fact want to combine that commit with this PR since it will help the API check pass. If you decide to go that route, you can cherry-pick -x my commit on this PR and close the backport PR.

yeah makes sense, I'll cherry pick it here and close that one. Thanks @ljflores

In the future, update your PR description when you include more fixes.

Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuriw yuriw merged commit 81bddff into ceph:pacific Aug 21, 2023
@dparmar18
Copy link
Contributor Author

@dparmar18 #53007
You may in fact want to combine that commit with this PR since it will help the API check pass. If you decide to go that route, you can cherry-pick -x my commit on this PR and close the backport PR.

yeah makes sense, I'll cherry pick it here and close that one. Thanks @ljflores

In the future, update your PR description when you include more fixes.

ACK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants