osd: fix the scrubber behavior on multiple preemption attempts#39145
osd: fix the scrubber behavior on multiple preemption attempts#39145neha-ojha merged 2 commits intoceph:masterfrom
Conversation
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
|
jenkins retest this please |
| */ | ||
| void PgScrubber::add_delayed_scheduling() | ||
| { | ||
| m_end = m_start; // not blocking any range now |
There was a problem hiding this comment.
I don't understand this one. Perhaps this should be set when we complete a chunk?
There was a problem hiding this comment.
Yes to your question: this is what we would have done if we were able to complete the chunk. But as we were preempted:
There are two paths for preemption:
-
locally, which means we are either in state BuildMap or in WaitReplicas:
We won't be scrubbing anything, but we still maintain the object range.
Setting 'end' to 'start' below (l. 591) frees that block. But it's also solved by the other change - we no longer block preemption attempt
(but that path is slower). -
a replica might signal in its returning message that it was preempted (and will not be providing a map).
The primary may still be working on creating its own map.
But then we notice the preemption, and while we
(unlike the original code) do not retry selecting a range immediately (we wait for all replicas, then spend time requeuing
vie PendingTimer), at least we allow multiple preemptions.
And - once we get to PendingTimer - there is no reason for us to maintain our hold on the old chunk
There was a problem hiding this comment.
I don't understand this one. Perhaps this should be set when we complete a chunk?
m_start = m_end would be to complete a chunk
m_end = m_start when aborting/preempting the current chunk to stop blocking any range [m_start, m_end)
src/osd/pg_scrubber.cc
Outdated
|
|
||
| // signal the preemption | ||
| preemption_data.do_preempt(); | ||
| m_end = m_start; // free the range we were scrubbing |
There was a problem hiding this comment.
Is this the only caller of do_preempt()?
There was a problem hiding this comment.
See previous comment.
Latest scrub code creates a time window in which a specific scrub is marked as "preempted", but future preemptions are prohibited. Write operations handled are then blocked but not restarted on time. Fixes: https://tracker.ceph.com/issues/48793 Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
a864dc4 to
4cdb293
Compare
|
https://pulpito.ceph.com/rfriedma-2021-01-28_20:14:16-rados-wip-ronenf-scrub-48793-distro-basic-smithi/ Waiting for longer tests. |
| // otherwise - write requests arriving while 'already preempted' is set | ||
| // but 'preemptable' is not - will not be allowed to continue, and will | ||
| // not be requeued on time. | ||
| return false; |
There was a problem hiding this comment.
It feels dangerous to ignore m_start -> m_end blocking range, but I assume that we are guaranteeing that the current scrub chunk will be aborted. Somewhere m_end = m_start might happen before eventually restarting that chunk or a smaller chunk at m_start.
There was a problem hiding this comment.
will verify. Thanks
There was a problem hiding this comment.
I don't see a scenario in which we have a problem with modifying 'end' in that direction (i.e. to be equal to 'start').
The current chunk will be aborted. And until that happens:
even the backend building of the local map is using its own copy of the start and end markers.
|
https://pulpito.ceph.com/nojha-2021-02-01_21:31:14-rados-wip-39145-distro-basic-smithi/ - no related failures or out of order issues in this run |
|
@dzafman @athanatos This PR fixes a bug which causes multiple dead jobs in every rados run. If you are happy with this version, I'd like to merge it. |
Latest scrub code creates a time window in which a specific scrub
is marked as "preempted", but future preemptions are prohibited.
Write operations handled are then blocked but not restarted on time.
Fixes: https://tracker.ceph.com/issues/48793