Skip to content

osd/PG: restrict async_recovery_targets to up osds#22664

Merged
jdurgin merged 1 commit intoceph:masterfrom
neha-ojha:wip-fix-choose-acting
Jun 21, 2018
Merged

osd/PG: restrict async_recovery_targets to up osds#22664
jdurgin merged 1 commit intoceph:masterfrom
neha-ojha:wip-fix-choose-acting

Conversation

@neha-ojha
Copy link
Member

When an osd that is part of the acting set and not the up set, gets chosen
as an async_recovery_target, it gets removed from the acting set. Since this
osd is no longer in the up or acting set, it is classified as a stray in
the next peering cycle. This results in choose_acting() looping between two
proposed acting sets.

To avoid this, we will only choose up osds as async_recovery_targets.

Fixes: https://tracker.ceph.com/issues/24487
Signed-off-by: Neha Ojha nojha@redhat.com

Copy link
Contributor

@dzafman dzafman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@neha-ojha
Copy link
Member Author

No failures in 50 runs of that particular test which failed earlier:
http://pulpito.ceph.com/nojha-2018-06-20_22:12:32-rados:thrash-wip-24487-distro-basic-smithi/

No related failures in the rados suite run:
http://pulpito.ceph.com/nojha-2018-06-21_00:18:52-rados-wip-24487-distro-basic-smithi/

@neha-ojha neha-ojha requested a review from jdurgin June 21, 2018 17:53
// do not include strays
if (stray_set.find(shard_i) != stray_set.end())
continue;
if (!is_up(shard_i))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a comment explaining why !up shards are excluded?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

When an osd that is part of the acting set and not the up set, gets chosen
as an async_recovery_target, it gets removed from the acting set. Since this
osd is no longer in the up or acting set, it is classified as a stray in
the next peering cycle. This results in choose_acting() looping between two
proposed acting sets.

To avoid this, we will only choose up osds as async_recovery_targets.

Signed-off-by: Neha Ojha <nojha@redhat.com>
@neha-ojha neha-ojha force-pushed the wip-fix-choose-acting branch from f19af6a to 7f1b6ad Compare June 21, 2018 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants