Skip to content

osd: Try other PGs when reservation failures occur#38853

Merged
dzafman merged 2 commits intoceph:masterfrom
dzafman:wip-48843
Mar 12, 2021
Merged

osd: Try other PGs when reservation failures occur#38853
dzafman merged 2 commits intoceph:masterfrom
dzafman:wip-48843

Conversation

@dzafman
Copy link
Contributor

@dzafman dzafman commented Jan 11, 2021

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@dzafman
Copy link
Contributor Author

dzafman commented Jan 12, 2021

I removed the local reservation pre-check. This makes the extremely rare "local reserve failed" a normal scenario. These log lines shows that things are working as expected.

2021-01-12T10:11:25.925-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.8 local reserve failed, nothing to be done now

2021-01-12T10:11:28.949-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.8 reserve failed, skipped
2021-01-12T10:11:28.949-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.10 local reserve failed, nothing to be done now

2021-01-12T10:11:30.909-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.8 reserve failed, skipped
2021-01-12T10:11:30.909-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.10 reserve failed, skipped

2021-01-12T10:11:32.898-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.8 reserve failed, skipped
2021-01-12T10:11:32.898-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.10 reserve failed, skipped
2021-01-12T10:11:32.898-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.16 reserve failed, skipped

2021-01-12T10:11:34.890-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.8 reserve failed, skipped
2021-01-12T10:11:34.890-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.10 reserve failed, skipped
2021-01-12T10:11:34.890-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.16 reserve failed, skipped
2021-01-12T10:11:34.890-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.18 reserve failed, skipped

2021-01-12T10:11:37.946-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.8 reserve failed, skipped
2021-01-12T10:11:37.946-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.10 reserve failed, skipped
2021-01-12T10:11:37.946-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.16 reserve failed, skipped
2021-01-12T10:11:37.946-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.18 reserve failed, skipped

2021-01-12T10:11:39.938-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.8 local reserve failed, nothing to be done now

2021-01-12T10:11:42.006-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.8 reserve failed, skipped
2021-01-12T10:11:42.006-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.10 local reserve failed, nothing to be done now

2021-01-12T10:11:43.042-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.8 reserve failed, skipped
2021-01-12T10:11:43.042-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.10 reserve failed, skipped
2021-01-12T10:11:43.042-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.16 local reserve failed, nothing to be done now

2021-01-12T10:11:45.034-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.8 reserve failed, skipped
2021-01-12T10:11:45.034-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.10 reserve failed, skipped
2021-01-12T10:11:45.034-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.16 reserve failed, skipped

2021-01-12T10:11:46.042-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.8 reserve failed, skipped
2021-01-12T10:11:46.042-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.10 reserve failed, skipped
2021-01-12T10:11:46.042-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.16 reserve failed, skipped
2021-01-12T10:11:46.042-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.18 reserve failed, skipped

2021-01-12T10:11:50.046-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.10 local reserve failed, nothing to be done now

2021-01-12T10:11:51.034-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.10 reserve failed, skipped
2021-01-12T10:11:51.034-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.16 local reserve failed, nothing to be done now

2021-01-12T10:11:52.070-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.10 reserve failed, skipped
2021-01-12T10:11:52.070-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.16 reserve failed, skipped
2021-01-12T10:11:52.070-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.18 local reserve failed, nothing to be done now

2021-01-12T10:11:54.078-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.10 reserve failed, skipped
2021-01-12T10:11:54.078-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.16 reserve failed, skipped
2021-01-12T10:11:54.078-0800 7f52d1efd700 20 osd.5 37 sched_scrub pg  1.18 reserve failed, skipped

Copy link
Contributor

@ronen-fr ronen-fr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend not spreading the "local resource locking" handling outside of
PgScrubber::m_local_osd_resource


[[nodiscard]] bool is_scrub_active() const final { return m_active; }

[[nodiscard]] bool get_reserve_failed() const final { return m_reserve_failed; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be part of the m_local_osd_resource object.

Seems to me that the clearing API shouldn't exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flag represents a temporary state of the PG. When other scrubs finish, reservations can succeed. This flag is a way to cycle through all possible scrubs to try to find one that can run now. We have to keep trying in order, so after 1 cycle we clear all flags to start over.

@dzafman dzafman marked this pull request as ready for review January 13, 2021 15:53
@ronen-fr
Copy link
Contributor

Not sure I understand the idea: if to mark PGs that failed to reserve their replicas - good. And calling
set_reserve_failed on scrub_machine.cc:99 makes sense. But then why set it also when failing to reserve the local
OSD (PG.cc:1364) ? if the local OSD is overwhelmed, no point in trying the other PGs we manage.

@dzafman
Copy link
Contributor Author

dzafman commented Jan 14, 2021

Not sure I understand the idea: if to mark PGs that failed to reserve their replicas - good. And calling
set_reserve_failed on scrub_machine.cc:99 makes sense. But then why set it also when failing to reserve the local
OSD (PG.cc:1364) ? if the local OSD is overwhelmed, no point in trying the other PGs we manage.

Most of the time if local reservation aren't available then osd/OSD.cc:7569 pre-check (doesn't grab any reservations) will fast out of OSD::sched_scrub(). In a very rare race where the last reservation is grabbed by a remote scrub before we try to get a local reservation then reserve_failed set in PG.cc:1364 is detected at osd/OSD.cc:7652 and we can just jump out of the loop.

@ronen-fr
Copy link
Contributor

Not sure I understand the idea: if to mark PGs that failed to reserve their replicas - good. And calling
set_reserve_failed on scrub_machine.cc:99 makes sense. But then why set it also when failing to reserve the local
OSD (PG.cc:1364) ? if the local OSD is overwhelmed, no point in trying the other PGs we manage.

Most of the time if local reservation aren't available then osd/OSD.cc:7569 pre-check (doesn't grab any reservations) will fast out of OSD::sched_scrub(). In a very rare race where the last reservation is grabbed by a remote scrub before we try to get a local reservation then reserve_failed set in PG.cc:1364 is detected at osd/OSD.cc:7652 and we can just jump out of the loop.

I still didn't get what we gain by setting the flag on local reservation failure, but will be happy to take this discussion offline.

@tchaikov
Copy link
Contributor

jenkins test make check

1 similar comment
@dzafman
Copy link
Contributor Author

dzafman commented Jan 15, 2021

jenkins test make check

@dzafman dzafman requested review from neha-ojha and ronen-fr January 16, 2021 00:24
goto out;
}
// If this is set now we must have had a local reserve failure, so can't scrub anything right now
if (pg->get_reserve_failed()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ronen-fr This check is why I mark a PG for local reservation failure too.

@dzafman dzafman marked this pull request as draft January 29, 2021 19:07
@github-actions
Copy link

github-actions bot commented Mar 3, 2021

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

dzafman added 2 commits March 5, 2021 11:41
Signed-off-by: David Zafman <dzafman@redhat.com>
@dzafman dzafman marked this pull request as ready for review March 5, 2021 20:32
@neha-ojha
Copy link
Member

jenkins test make check

@dzafman
Copy link
Contributor Author

dzafman commented Mar 10, 2021

https://pulpito.ceph.com/nojha-2021-03-06_16:46:51-rados-wip-zafman-testing-distro-basic-smithi/

This run had an error occur 55(62) times which is being investigated, but blocked many tests.

@dzafman dzafman merged commit 8bc0719 into ceph:master Mar 12, 2021
@dzafman dzafman deleted the wip-48843 branch March 12, 2021 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants