osd: Extracting scrub-related code from the PG & friends by ronen-fr · Pull Request #38111 · ceph/ceph

ronen-fr · 2020-11-16T16:24:31Z

Objectives:

- identify and extract scrub code, as a first step into sharing scrub code between classic
  and Crimson implementations;
- refactor the code as needed for clarity;

Main changes:

- PgScrubber (pg_scrubber.{h,cc}) implements the scrub code that resided in PG.cc. And:

- PrimaryLogScrub.{h,cc} now implement the parts of the scrub code that resided in PrimaryLogPG.
  (note that I didn't see much point in the separation, but changing that was outside the scope of this PR)

- The flags that govern scrub scheduling/operation were separated into two structures: one is maintained
  by the PG, and details the "next scrub" parameters. The other is PgScrubber-owned, and determines the
  operation options for the running scrub. This collection of flags is set (and is "frozen") at
  PG:queue_scrub(), following the logic of the existing code.

- the scrubbing is now implemented as an explicit state-machine.
- and the events that trigger state changes are now translated into separate specific messages, that
  are initiated using specific calls;
- the priorities used when queueing and handling those events were reworked, trying to match the original
  semantics.

- Scrub-related resources (the OSD-owned 'local' and 'remote' scrub resources) are managed using RAII
  wrappers, hopefully guaranteeing no leakage.

Design decisions:

- the new scrub FSM (in scrub_machine.*) does not contain PG-specific code. All code that requires knowledge
  of the PG internals (or PrimaryLogPG internals) is in PgScrubber & PrimaryLogScrub. That, to achieve:
  - easy switching from (the slow) boost::state_chart to something better (I promised the SML author to give
    it a second try, after he has incorporated some needed changes);
  - ease of adaptation to Crimson;
  - clearer separation of concerns.

This PR is a re-structured version of #37317, with the changes logically divided into separate commits for the reviewers' sake.

dzafman · 2020-11-17T03:33:03Z

@ronen-fr The qa/standalone/scrub/osd-scrub-test.sh TEST_deep_scrub_abort() test fails on my build machine using this code at bc22448.

dzafman · 2020-11-17T03:45:07Z

@neha-ojha @ronen-fr There is a race with setting noscrub or nodeep-scrub, see https://tracker.ceph.com/issues/47767. The PG::abort_scrub() function doesn't account for messages in-flight from scrub replicas. Wondering if you can deal with this better in the state machine version. Coincidentally, the failure above is exactly with the test setting nodeep-scrub.

dzafman · 2020-11-17T03:47:12Z

@ronen-fr I started reviewing this version of the code. I'll be out tomorrow (Tue) back the next day.

ronen-fr · 2020-11-17T13:21:58Z

There is a race with setting noscrub or nodeep-scrub, see https://tracker.ceph.com/issues/47767. The PG::abort_scrub() function doesn't account for messages in-flight from scrub replicas. Wondering if you can deal with this better in the state machine version. Coincidentally, the failure above is exactly with the test setting nodeep-scrub.

@dzafman, thanks for the comments.

Seems to me that the two issues are unrelated:

there is that race condition you mention. In the new code the chances of this scenario causing a crash is low: PrimaryLogPG::do_replica_scrub_map() checks is_scrub_active() before calling PgScrubber::map_from_replica().

Now - there is still a theoretical issue if the primary terminated the scrub and started a new one while the message was in-flight. MapsCollectionStatus::mark_arriving_map() will then return 'false', causing map_from_replica() to assert. I would suggest removing that specific assert. I do not think it serves a positive purpose.

there is another consistent problem w/ ocb-scrub-test.sh: setting 'ceph osd set $stopscrub' , for the deep-scrub case, does not prevent a regular scrub from starting immediately after the deep-scrub was aborted. The test does not cater for that. In my local tests, I have modified the deep-scrub case to also enable/disable regular scrubs. I would welcome a better suggestion.

src/osd/PrimaryLogPG.cc

src/osd/scrub_machine.h

src/osd/pg_scrubber.cc

src/osd/OSD.cc

src/osd/OSD.h

src/osd/scrubber_common.h

src/osd/pg_scrubber.cc

src/osd/pg_scrubber.h

src/osd/pg_scrubber.cc

src/osd/pg_scrubber.h

…espace) Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

into a PrimaryLogScrub - a derivative of PgScrubber. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

Fixing Crimson compilation. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

(following code-review comments) main changes: - incoming scrub events are now validated by the scrubber before being sent to the SFM: - messages from a previous intervals are discarded, possibly signalling a scrub of the current scrub and full scrubber reset; - abort newly mandated by a configuration change is acted upon; - stale messages from previous scrub sessions are discarded; - replica reservations are silently discarded on interval changes - some modifications to the state diagram Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

dismissed

tchaikov · 2020-12-11T08:16:20Z

failures are not related.

ronen-fr · 2020-12-11T18:28:01Z

Something strange is happening to our perf test...
"Started 1d 3h 30m 46s ago"

ronen-fr · 2020-12-12T07:44:34Z

jenkins test classic perf

athanatos

I'm going to go ahead and merge this version, please send a follow up PR handling the last few comments here.

athanatos · 2020-12-14T22:20:10Z

src/osd/pg_scrubber.cc

+      // we have not seen this interval change yet.
+      // The remote reservations are no longer relevant.
+
+      m_last_dead_interval = current_interval;


I think this variable and related logic is entirely unnecessary. When we saw the map reflecting the interval change, we called PrimaryLogPG::on_change which called PGScrubber::clear_pgscrub_state which cleared the reservations.

turns out on_change() is too late in the process.
Looking now for the earliest location to hook the 'no messages' reservations clearing

athanatos · 2020-12-14T22:20:22Z

src/osd/pg_scrubber.cc

+}
+
+// returns false if the message should be discarded. Handles the notification of interval
+// change, if not done already. called only for active scrub? not sure.


Please answer this question.

modified (separate PR)

athanatos · 2020-12-14T22:20:59Z

src/osd/pg_scrubber.cc

+
+  if (epoch_to_verify < current_interval) {
+    // the event will not be delivered. If we have already noticed and handled
+    // the change of seasons, it will be silently discarded. Otherwise - we


We must have already handled the interval change, or get_same_interval_since() wouldn't be updated yet.

jdurgin · 2020-12-15T01:25:45Z

scrub: extract from PG etc. into state machine

jdurgin · 2020-12-15T01:25:47Z

scrub: extract from PG etc. into state machine

Snap trimming that was postponed as the target PG was scrubbing must be restarted at scrub completion. PR ceph#38111 moved trimming restart to just before the scrub fully terminated. Current PR fixes that. Fixes: https://tracker.ceph.com/issues/52026 Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

Snap trimming that was postponed as the target PG was scrubbing must be restarted at scrub completion. PR ceph#38111 moved trimming restart to just before the scrub fully terminated. The current PR fixes that. Trimming is also restarted in those cases where scrub was queued but aborted immediately. Fixes: https://tracker.ceph.com/issues/52026 Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

Snap trimming that was postponed as the target PG was scrubbing must be restarted at scrub completion. PR ceph#38111 moved trimming restart to just before the scrub fully terminated. The current PR fixes that. Trimming is also restarted in those cases where scrub was queued but aborted immediately. Fixes: https://tracker.ceph.com/issues/52026 Signed-off-by: Ronen Friedman <rfriedma@redhat.com> (cherry picked from commit 948d326)

Snap trimming that was postponed as the target PG was scrubbing must be restarted at scrub completion. PR ceph#38111 moved trimming restart to just before the scrub fully terminated. The current PR fixes that. Trimming is also restarted in those cases where scrub was queued but aborted immediately. Fixes: https://tracker.ceph.com/issues/52026 Signed-off-by: Ronen Friedman <rfriedma@redhat.com> (cherry picked from commit 948d326) Conflicts: src/osd/pg_scrubber.cc Conflict resolved by removing a clear_queued_or_active() call that was dragged in.

Snap trimming that was postponed as the target PG was scrubbing must be restarted at scrub completion. PR ceph#38111 moved trimming restart to just before the scrub fully terminated. The current PR fixes that. Trimming is also restarted in those cases where scrub was queued but aborted immediately. Fixes: https://tracker.ceph.com/issues/52026 Signed-off-by: Ronen Friedman <rfriedma@redhat.com> (cherry picked from commit 948d326) Conflicts: src/osd/pg_scrubber.cc Conflict resolved by removing a clear_queued_or_active() call that was dragged in. (cherry picked from commit b010892) Resolves: rhbz#2068531

Snap trimming that was postponed as the target PG was scrubbing must be restarted at scrub completion. PR ceph#38111 moved trimming restart to just before the scrub fully terminated. The current PR fixes that. Trimming is also restarted in those cases where scrub was queued but aborted immediately. Fixes: https://tracker.ceph.com/issues/52026 Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

Snap trimming that was postponed as the target PG was scrubbing must be restarted at scrub completion. PR ceph#38111 moved trimming restart to just before the scrub fully terminated. The current PR fixes that. Trimming is also restarted in those cases where scrub was queued but aborted immediately. Fixes: https://tracker.ceph.com/issues/52026 Signed-off-by: Ronen Friedman <rfriedma@redhat.com> (cherry picked from commit 948d326) Conflicts: src/osd/pg_scrubber.cc Conflict resolved by removing a clear_queued_or_active() call that was dragged in.

Snap trimming that was postponed as the target PG was scrubbing must be restarted at scrub completion. PR ceph#38111 moved trimming restart to just before the scrub fully terminated. The current PR fixes that. Trimming is also restarted in those cases where scrub was queued but aborted immediately. Fixes: https://tracker.ceph.com/issues/52026 Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

Snap trimming that was postponed as the target PG was scrubbing must be restarted at scrub completion. PR ceph#38111 moved trimming restart to just before the scrub fully terminated. The current PR fixes that. Trimming is also restarted in those cases where scrub was queued but aborted immediately. Fixes: https://tracker.ceph.com/issues/52026 Signed-off-by: Ronen Friedman <rfriedma@redhat.com> (cherry picked from commit 948d326) Conflicts: src/osd/pg_scrubber.cc Conflict resolved by removing a clear_queued_or_active() call that was dragged in. (cherry picked from commit b010892) Resolves: rhbz#2068531

Snap trimming that was postponed as the target PG was scrubbing must be restarted at scrub completion. PR ceph#38111 moved trimming restart to just before the scrub fully terminated. The current PR fixes that. Trimming is also restarted in those cases where scrub was queued but aborted immediately. Fixes: https://tracker.ceph.com/issues/52026 Signed-off-by: Ronen Friedman <rfriedma@redhat.com> (cherry picked from commit 948d326) Conflicts: src/osd/pg_scrubber.cc Conflict resolved by removing a clear_queued_or_active() call that was dragged in.

github-actions bot added the core label Nov 16, 2020

ronen-fr requested review from adamemerson, athanatos, dzafman, jdurgin, neha-ojha and rzarzynski November 16, 2020 16:26

ronen-fr requested a review from a team as a code owner November 16, 2020 18:05

github-actions bot added the crimson label Nov 16, 2020