osd/PG: do not choose stray osds as async_recovery_targets#22330
osd/PG: do not choose stray osds as async_recovery_targets#22330jdurgin merged 1 commit intoceph:masterfrom
Conversation
Without this change, we might accept stray osds as async_recovery_targets, and need to ensure that they get a chance to become part of the acting set after recovery is over. However, when choose_acting() is called in the Recovered state, we set restrict_to_up_acting=true, which does not allow them to get back to the acting set. Therefore, similar to backfill, do not allow stray osds to become async_recovery_targets. Signed-off-by: Neha Ojha <nojha@redhat.com>
|
looks good, can you add a tracker issue if one doesn't exist so we can track the backport to mimic? |
|
There exists one https://tracker.ceph.com/issues/23827, which does not directly highlight the underlying issue. Should we use this? |
|
@jdurgin created a new issue with the failure that highlighted this problem. |
|
None of the failures look related. Should be ready to merge. |
@neha-ojha @jdurgin I have some trouble to understand the direct relationship between https://tracker.ceph.com/issues/23827 and this pr. Shouldn't https://tracker.ceph.com/issues/23827 be addressed by #21909 instead? Would you mind sharing some insights? Thanks! |
|
@xiexingguo https://tracker.ceph.com/issues/23827 was seen in luminous and was addressed using #21909. This was merged to master on May 9, 2018, and we saw https://tracker.ceph.com/issues/24349 on May 24, 2018. On further investigation, we realized that the cause was different this time and needed #22330 as a fix. This fix was specific to master and mimic, since it was related to async recovery. |
Without this change, we might accept stray osds as async_recovery_targets,
and need to ensure that they get a chance to become part of the acting set
after recovery is over.
However, when choose_acting() is called in the Recovered state, we set
restrict_to_up_acting=true, which does not allow them to get back to the
acting set.
Therefore, similar to backfill, do not allow stray osds to become
async_recovery_targets.
Fixes: https://tracker.ceph.com/issues/24349
Signed-off-by: Neha Ojha nojha@redhat.com