osd: Cancel in-progress scrubs (not user requested)#35909
osd: Cancel in-progress scrubs (not user requested)#35909neha-ojha merged 5 commits intoceph:masterfrom
Conversation
0522dc0 to
39fa981
Compare
|
@neha-ojha I need to figure out how to clean up properly when aborting a scrub. |
|
jenkins test make check |
|
@neha-ojha I manually tested the following: After an aborted scrub, with noscrub still set do a user requested scrub, observe it start and finish |
|
jenkins test dashboard backend |
neha-ojha
left a comment
There was a problem hiding this comment.
This looks good, needs a test in the lines of #35909 (comment)
|
i am seeing some "Exiting scrub checking -- not all pgs scrubbed." failures in https://pulpito.ceph.com/kchai-2020-07-07_06:01:29-rados-wip-kefu-testing-2020-07-07-1058-distro-basic-smithi/ . will drop this change from my batch and rerun. https://pulpito.ceph.com/kchai-2020-07-08_03:24:15-rados-wip-kefu-testing-2020-07-07-1058-distro-basic-smithi/ still has lots of failure. so it's not related. |
|
@neha-ojha The now included functional tests pass, but other scrub test are failing with this change. Need to investigate why. |
|
TESTING PASSED https://pulpito.ceph.com/dzafman-2020-07-18_09:43:13-rados-wip-zafman-testing-distro-basic-smithi @neha-ojha other unrelated errors on fairly recent master: 5238049 Failure in qa/tasks/repair_test.py caused by this change because noscrub and nodeep-scrub set when repair request initiates another scrub to verfy the repair. Testing fix 47792d2 PASSED single jobs |
|
@neha-ojha One issue remains. I'm concerned with the auto repair feature which can causes scrub -> deep-scrub -> repair -> deep-scrub. If the original scrub is user requested then all subsequent scrubs should not be blocked by noscrub, nodeep-scrub. |
|
@neha-ojha I forgot that auto repair only applies to scheduled scrubs. So any auto repair can be interrupted by noscrub or nodeep-scrub. There is no change needed to handle existing auto repair functionality. |
src/osd/PG.cc
Outdated
| { | ||
| // Since repair is only by request and we need to scrub afterward | ||
| // treat the same as req_scrub. | ||
| if (!scrubber.req_scrub && !scrubber.check_repair) { |
There was a problem hiding this comment.
Do we need the check_repair check based on #35909 (comment)?
There was a problem hiding this comment.
Do we need the
check_repaircheck based on #35909 (comment)?
It was discovered by "tasks/repair_test" teuthology test. Since a direct repair does a check_repair (do a deep scrub) it should NOT be affected by noscrub/nodeep-scrub. Of course, now that I think about it, maybe if nodeep-scrub is set after an auto repair already happened and it set check_repair that deep-scrub would NOT be aborted. It could be argued that if you have come that far and completed an auto repair, you should go ahead and finish all the clean-up. On the other hand the user may want all schedule scrubbing related activity to stop. immediately.
Release note added to PendingReleaseNotes in commit to be squashed. |
|
Another rados run passed: Same unrelated failures as before |
Signed-off-by: David Zafman <dzafman@redhat.com>
Signed-off-by: David Zafman <dzafman@redhat.com>
Signed-off-by: David Zafman <dzafman@redhat.com>
This change adds new scrubber.req_scrub to track user requested scrubs, deep_scrub or repair. Fixes: https://tracker.ceph.com/issues/46275 Signed-off-by: David Zafman <dzafman@redhat.com>
Signed-off-by: David Zafman <dzafman@redhat.com>
|
jenkins test dashboard backend |
Currently, after an aborted scrub the next scrub is stuck. The PG::abort_scrub() isn't sufficient even though it is based on the existing code that aborts scrub when pg changes out from under scrub.
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard backendjenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume tox