Skip to content

osd/PG: do not choose stray osds as async_recovery_targets#22330

Merged
jdurgin merged 1 commit intoceph:masterfrom
neha-ojha:wip-async-up
May 31, 2018
Merged

osd/PG: do not choose stray osds as async_recovery_targets#22330
jdurgin merged 1 commit intoceph:masterfrom
neha-ojha:wip-async-up

Conversation

@neha-ojha
Copy link
Member

@neha-ojha neha-ojha commented May 30, 2018

Without this change, we might accept stray osds as async_recovery_targets,
and need to ensure that they get a chance to become part of the acting set
after recovery is over.

However, when choose_acting() is called in the Recovered state, we set
restrict_to_up_acting=true, which does not allow them to get back to the
acting set.

Therefore, similar to backfill, do not allow stray osds to become
async_recovery_targets.

Fixes: https://tracker.ceph.com/issues/24349
Signed-off-by: Neha Ojha nojha@redhat.com

Without this change, we might accept stray osds as async_recovery_targets,
and need to ensure that they get a chance to become part of the acting set
after recovery is over.

However, when choose_acting() is called in the Recovered state, we set
restrict_to_up_acting=true, which does not allow them to get back to the
acting set.

Therefore, similar to backfill, do not allow stray osds to become
async_recovery_targets.

Signed-off-by: Neha Ojha <nojha@redhat.com>
@neha-ojha neha-ojha requested review from jdurgin and liewegas May 31, 2018 00:01
@liewegas liewegas changed the title PG: do not choose stray osds as async_recovery_targets osd/PG: do not choose stray osds as async_recovery_targets May 31, 2018
@jdurgin
Copy link
Member

jdurgin commented May 31, 2018

looks good, can you add a tracker issue if one doesn't exist so we can track the backport to mimic?

@neha-ojha
Copy link
Member Author

neha-ojha commented May 31, 2018

There exists one https://tracker.ceph.com/issues/23827, which does not directly highlight the underlying issue. Should we use this?

@neha-ojha
Copy link
Member Author

@jdurgin created a new issue with the failure that highlighted this problem.

@neha-ojha
Copy link
Member Author

@neha-ojha
Copy link
Member Author

None of the failures look related. Should be ready to merge.

@jdurgin jdurgin merged commit 11aa333 into ceph:master May 31, 2018
@xiexingguo
Copy link
Member

There exists one https://tracker.ceph.com/issues/23827, which does not directly highlight the underlying issue. Should we use this?

@neha-ojha @jdurgin I have some trouble to understand the direct relationship between https://tracker.ceph.com/issues/23827 and this pr. Shouldn't https://tracker.ceph.com/issues/23827 be addressed by #21909 instead?

Would you mind sharing some insights? Thanks!

@neha-ojha
Copy link
Member Author

@xiexingguo https://tracker.ceph.com/issues/23827 was seen in luminous and was addressed using #21909. This was merged to master on May 9, 2018, and we saw https://tracker.ceph.com/issues/24349 on May 24, 2018. On further investigation, we realized that the cause was different this time and needed #22330 as a fix. This fix was specific to master and mimic, since it was related to async recovery.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants