osd/scrub: fix the handling of interval changes by ronen-fr · Pull Request #49959 · ceph/ceph

ronen-fr · 2023-02-01T11:21:45Z

Separating and clarifying the handling of interval termination, new interval starting, and configuration changes:

A primary PG now registers with its OSD for scrubbing only when fully activated: on_pg_activate() called from PG::on_activate();

When the interval ends, the PG is removed from the OSD queue by stop_active_scrubs(), called from start_peering_interval().

Configuration changes are still handled by update_scrub_job().

Signed-off-by: Ronen Friedman rfriedma@redhat.com

athanatos

Seems reasonable structurally, see my comments.

This PR's single commit seems to contain several orthogonal changes. In order to review it, I needed to split it into smaller atomic commits. https://github.com/athanatos/ceph/tree/sjust/wip-scrub-interval-pr-49959 has those commits -- please split up this PR in a similar way (commit messages in my branch are likely insufficient or incorrect, I'm not sure what the motivation was for a few of them). In the future, it would really expedite review if you'd split up your PRs in a similar manner.

src/osd/PeeringState.h

athanatos · 2023-02-02T03:44:38Z

src/osd/scrubber/pg_scrubber.cc

  // verify that the 'in_q' status matches our "Primariority"
  if (m_scrub_job && is_primary() && !m_scrub_job->in_queues) {
-    dout(1) << __func__ << " !!! primary but not scheduled! " << dendl;
+    dout(10) << __func__ << " !!! primary but not scheduled! " << dendl;


This is validating an invariant that should always hold, right? I would think that derr would be more appropriate.

It was (that's why I used dout(1) - and should have probably used derr).
I was less sure now (that's why changed to 10), and as the logic changes in that
othe scrub-sched PR - didn't invest time in making sure.

For now - left this change out of the PR

athanatos · 2023-02-02T05:23:23Z

src/osd/scrubber/pg_scrubber.cc

  // cleared.
-  replica_handling_done();
-  m_fsm->process_event(FullReset{});
+  if (m_fsm->is_active_replica()) {


It seems to me that we probably want FullReset{} to be a valid event regardless of the state of the state machine -- why not add noop handlers so that it's universally valid?

the full_reset event is always effective, regardless of the FSM state (as it was meant
as a way to guarantee reset).

For the replica case - I like the suggestion of adding a new event to the state machine. Performing
this change now.

For the primary case - that would not be enough to replace the 'role' data I have added here (for the
same reason m_active does not cover the whole timespan covered by is_queued_or_active). Adding an
event for this case is only reducing clarity (I have tried that now).

src/osd/scrubber/pg_scrubber.cc

'IntervalEnded' is generated whenever an interval ends while the scrubber is acting as a replica. The event triggers a reset of the internal scrubber state, and a scrubber FSM reset to 'not active'. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

m_queued_or_active now encodes additionally whether the scrubber is queued as a replica or a primary. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

and inline its functionality into its sole caller (PgScrubber::rm_from_osd_scrubbing()) Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

ronen-fr · 2023-02-02T20:38:27Z

@athanatos thanks for the work invested, especially re the commit splits

Separating and clarifying the handling of interval termination, new interval starting, and configuration changes: A primary PG now registers with its OSD for scrubbing only when fully activated: on_pg_activate() called from PG::on_activate(); When the interval ends, the PG is removed from the OSD queue by stop_active_scrubs(), called from start_peering_interval(). Configuration changes are still handled by update_scrub_job(). Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

resetting m_no_reply from within causes a crush, and prevents scrub code testing. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

The calls were removed as publish_stats..() expects a stable PG state, and that state is not synchronized with the scrubber operation. Fixes: https://tracker.ceph.com/issues/58496 Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

athanatos

I think IntervalEnded{} introduces new ways to incorrectly reset the scrubber state. It seems like all cases where we need to reset the scrubber should simplify to submitting a FullReset{} event with the state handlers calling cleanup handlers as needed.

athanatos · 2023-02-07T20:50:33Z

src/osd/scrubber/pg_scrubber.cc

  state_clear(PG_STATE_REPAIR);

  clear_scrub_reservations();
-  m_pg->publish_stats_to_osd();


The commit message for these removals is:

osd/scrub: remove some calls to publish_stats_to_osd() The calls were removed as publish_stats..() expects a stable PG state, and that state is not synchronized with the scrubber operation.

I don't think this conveys to a future reader of this commit why it's safe to remove them. Either
a) the publish_stats_to_osd() calls are necessary or
b) they aren't.

In the case of a), the fact that they also require a stable PG state merely means that removing them introduces another bug. In the case of b), the fact that they require a stable PG state isn't really relevant.

Presumably, you've audited the code and concluded that they are safe to remove. Please explain that in the commit message.

athanatos · 2023-02-07T21:01:58Z

src/osd/scrubber/pg_scrubber.cc

 void ReplicaReservations::send_reject()
 {
-  // stop any pending timeout timer
-  m_no_reply.reset();


The message for this

osd/scrub: fix replica-reservation timeout cancellation resetting m_no_reply from within causes a crush, and prevents scrub code testing.

doesn't explain how this results in a crash. At first glance, this modification would appear to be a noop as send_reject() is called immediately.

Further inspection shows that ReplicaReservations::handle_no_reply_timeout() also invokes ReplicaReservations::send_reject(). ReplicaReservations::handle_no_reply_timeout() is in turn invoked from ReplicaReservations::no_reply_t::m_abort_callback as populated in no_reply_t(). I suspect that the crash you are seeing is due to m_abort_callback being invoked from the timer thread under the sleep_lock. Invoking m_no_reply.reset() in that context would take the sleep_lock again resulting in an assert failure.

If the above is indeed the cause of the crash, this doesn't appear to be a fix as it fails to clean up m_no_reply in the timeout path.

@athanatos : yes - that was the failed scenario. Checking what you said in the last sentence.

I'm now less sure about this diagnosis. SafeTimer can optionally either hold the lock while calling callbacks or not. There's a boolean param passed to the constructor which governs this behavior -- safe_callbacks. sleep_timer specifically seems to be constructed with safe_callbacks set to false, so locking OSDService::sleep_lock shouldn't be a problem.

Can you link to the tracker id?

athanatos · 2023-02-08T07:30:28Z

src/osd/scrubber/pg_scrubber.cc

-		    "{}: interval changed ({} -> {}). Aborting active scrub.",
-		    __func__, m_interval_start, m_pg->get_same_interval_since())
-	       << dendl;
-      scrub_clear_state();


This diff doesn't happen to contain PgScrubber::scrub_clear_state(), so I comment here.

// uses process_event(), so must be invoked externally void PgScrubber::scrub_clear_state() { dout(10) << __func__ << dendl; clear_pgscrub_state(); m_fsm->process_event(FullReset{}); }

scrub_clear_state() copied above still uses a FullReset{}. ActiveReplica does happen to have a FullReset{} handler, but its behavior differs from IntervalEnded handler in that it lacks the call to replica_handling_done(). scrub_clear_state() has a few callers:

PgScrubber::verify_against_abort() -- I find this mechanism confusing, tbh, but it does seem to be callable at any time, including when there are outstanding replica scrubs.

PgScrubber::replica_scrub_op -- This is an error case and seems to also be callable while there is outstanding replica state.

PrimaryLogPG::on_change -- I don't really see a reason for this one now that we call into scrubber in PG::on_new_interval(). This also seems like it means that replicas will get IntervalEnded and then FullReset and primaries will get FullReset twice.

PrimaryLogPG::on_shutdown -- Same observations as above, uses FullReset though the scrubber might be a replica.

Introducing IntervalEnded seems to have created a bunch of opportunities for pg state resets to be inconsistent. Instead, I propose the following changes:

Rename PgScrubber::stop_active_scrubs to PgScrubber::on_new_interval, simplify it to just submit a FullReset

Modify the FullReset handlers for every state to call back into cleanup functions specific to their states (or do it in their destructors?)

Drop the call to PgScrubber::clear_scrub_state() from PrimaryLogPG::on_change()

Change PrimaryLogPG::on_shutdown to invoke a PgScrubber::on_shutdown method which submits a FullReset event.

Update other reset pathways to use FullReset as well.

This way, the state machine states handle any state-specific behavior, and reset becomes simply an event which can be done identically from anywhere.

I think this should remove the need for queued_for_role() as well?

Rename PgScrubber::stop_active_scrubs to PgScrubber::on_new_interval, simplify it to just submit a FullReset

@athanatos the problem with that is that the states of "being reserved by a Primary" (and the longer "a message carrying the request to be reserved by a primary was queued" are not handled within the scrubber state machine (it could have been done differently, at least regarding the first, but it's not something I wish to change now)

By "a message carrying the request to be reserved by a primary was queued", do you mean the time the message spends between being queued via the messenger Dispatcher interface and when it gets processed by the PG and and scrubber? Why is that important?

Or, do you mean the time between when replica_scrub_op is called and when the event queued there is processed?

@ronen-fr I think it's worth mapping these resources onto the state machine. I've started working on a branch https://github.com/athanatos/ceph/tree/sjust/wip-scrub-interval-pr-49959-2 which does that. I'm still working on it, so it's likely got errors.

github-actions · 2023-03-10T07:23:40Z

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

github-actions · 2023-08-18T16:01:48Z

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

github-actions · 2023-09-17T17:01:21Z

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

github-actions bot added core crimson labels Feb 1, 2023

ronen-fr marked this pull request as ready for review February 1, 2023 14:09

ronen-fr requested review from a team as code owners February 1, 2023 14:09

ronen-fr requested review from aclamk and jdurgin February 1, 2023 14:21

ronen-fr force-pushed the wip-rf-58496-cln branch from fe49cd7 to 739fc4b Compare February 1, 2023 16:12

ronen-fr added the bug-fix label Feb 1, 2023

ronen-fr requested a review from neha-ojha February 1, 2023 16:13

ronen-fr force-pushed the wip-rf-58496-cln branch from 739fc4b to ee7038c Compare February 1, 2023 20:04

athanatos requested changes Feb 2, 2023

View reviewed changes

ronen-fr force-pushed the wip-rf-58496-cln branch from ee7038c to 1e29441 Compare February 2, 2023 14:20

ronen-fr added 2 commits February 2, 2023 16:47

osd/scrub: add role information to m_queued_or_active

cc6f553

m_queued_or_active now encodes additionally whether the scrubber is queued as a replica or a primary. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

osd/scrub: remove PgScrubber::unregister_from_osd()

e4bc823

and inline its functionality into its sole caller (PgScrubber::rm_from_osd_scrubbing()) Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

ronen-fr force-pushed the wip-rf-58496-cln branch 6 times, most recently from cec0f5b to d419324 Compare February 2, 2023 20:35

ronen-fr added 4 commits February 2, 2023 22:51

osd/scrub: fix replica-reservation timeout cancellation

52b7078

resetting m_no_reply from within causes a crush, and prevents scrub code testing. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

osd/scrub: remove some calls to publish_stats_to_osd()

b736e7d

The calls were removed as publish_stats..() expects a stable PG state, and that state is not synchronized with the scrubber operation. Fixes: https://tracker.ceph.com/issues/58496 Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

osd/: misc scrub related formatting fixes -- no functional changes

dc0a9f7

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

ronen-fr force-pushed the wip-rf-58496-cln branch from d419324 to dc0a9f7 Compare February 2, 2023 20:52

ronen-fr requested a review from athanatos February 2, 2023 20:53

ronen-fr added the needs-qa label Feb 7, 2023

athanatos requested changes Feb 8, 2023

View reviewed changes

ronen-fr mentioned this pull request Feb 13, 2023

osd/scrub: guarantee only a primary calls publish_stats_to_osd() #50088

Closed

ronen-fr removed the bug-fix label Feb 13, 2023

ronen-fr removed the needs-qa label Mar 8, 2023

github-actions bot added the needs-rebase label Mar 10, 2023

ronen-fr marked this pull request as draft March 12, 2023 09:09

github-actions bot added the stale label Aug 18, 2023

github-actions bot closed this Sep 17, 2023

Conversation

ronen-fr commented Feb 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

athanatos left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ronen-fr Feb 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ronen-fr commented Feb 2, 2023

Uh oh!

athanatos left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 10, 2023

Uh oh!

github-actions bot commented Aug 18, 2023

Uh oh!

github-actions bot commented Sep 17, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ronen-fr commented Feb 1, 2023 •

edited

Loading

athanatos left a comment •

edited

Loading

ronen-fr Feb 2, 2023 •

edited

Loading