mon/OSDMonitor: do not mark newly created OSDs OUT by liewegas · Pull Request #39631 · ceph/ceph

liewegas · 2021-02-22T23:48:29Z

This behavior appears to date back all the way to the 'osd new' command
in c9e6cac, and for 'osd create' from
118f081. The first commit has no
real explanation, but it presumably inherited it from the second. That
second commit, though, says

if we are creating an osd which has the same id as a previously
removed 'in' osd, we should not mark this newly created osd as 'in'

This isn't actually a good idea, however. If we are creating (or reusing)
a new OSD id, the OSD that starts up will have no data. So no matter what
there will be a data migration from the before state to the final state.
If we mark the osd OUT when the osd id is allocated but before the OSD
starts up, we'll create a middle state where PGs are mapped to the id and
then remapped (due to out) and a bunch of peering, and possibly some
data transfer will actually happen before the osd starts up and marks
itself in.

Instead, we have two cases:

If we are reusing a DESTROYED osd id, we should leave the in/out
state the way it was. This way we still go straight from the before
state to the after state (the osd will mark itself in when it starts up).
If we are allocating a new id in do_osd_create(), we should mark the
osd IN. That way, there the inbetween state will be that the OSD is
down--not that it exists but is out and PGs are mapped to some other
intermediate location.

Checklist

References tracker ticket
Updates documentation if necessary
Includes tests for new functionality or reproducer for bug

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox

Currently, new OSDs are marked OUT. This behavior appears to date back all the way to the 'osd new' command in c9e6cac, and for 'osd create' from 118f081. The first commit has no real explanation, but it presumably inherited it from the second. That second commit, though, says if we are creating an osd which has the same id as a previously removed 'in' osd, we should not mark this newly created osd as 'in' This isn't actually a good idea, however. If we are creating (or reusing) a new OSD id, the OSD that starts up will have no data. So no matter what there will be a data migration from the before state to the final state. If we mark the osd OUT when the osd id is allocated but before the OSD starts up, we'll create a middle state where PGs are mapped to the id (by virtue of the CRUSH weight) and then remapped away (due to out), creating a middle state where a bunch of PGs will repeer and maybe data will move. Instead, we have two cases: 1) If we are reusing a DESTROYED osd id, we should leave the in/out state the way it was. This way we still go straight from the before state to the after state (the osd will mark itself in when it starts up). 2) If we are allocating a new id in do_osd_create(), we want the OSD to be IN, so there is no middle state. Unfortunately, we have to work around apply_incremental() being obnoxious here: it's sloppy implementation will implicitly set EXISTS by virtue of new_osd_weight (the mark IN part) before applying the osd_state XOR, so be careful! (This behavior is mirrored by the Linux kernel implementation too, thankfully.) Signed-off-by: Sage Weil <sage@newdream.net>

Signed-off-by: Sage Weil <sage@newdream.net>

If we allocate a new OSD, don't raise a health alert about it. Signed-off-by: Sage Weil <sage@newdream.net>

liewegas requested a review from jdurgin February 22, 2021 23:48

github-actions bot added core mon labels Feb 22, 2021

jdurgin approved these changes Feb 23, 2021

View reviewed changes

jdurgin added the needs-qa label Feb 23, 2021

liewegas added wip-sage-testing and removed wip-sage-testing labels Feb 23, 2021

liewegas force-pushed the fix-add-osd-out branch from 0d88aed to c8f021c Compare February 24, 2021 19:55

mon/OSDMonitor: behave if inc map sets weight on not-yet-existing OSD

678dc40

Signed-off-by: Sage Weil <sage@newdream.net>

liewegas added the wip-sage-testing label Feb 27, 2021

osd/OSDMap: don't warn on NEW osd ids

7aba184

If we allocate a new OSD, don't raise a health alert about it. Signed-off-by: Sage Weil <sage@newdream.net>

liewegas merged commit f11ccd2 into ceph:master Mar 1, 2021

liewegas mentioned this pull request Mar 1, 2021

pacific: mon/OSDMonitor: do not mark newly created OSDs OUT #39748

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mon/OSDMonitor: do not mark newly created OSDs OUT#39631

mon/OSDMonitor: do not mark newly created OSDs OUT#39631
liewegas merged 3 commits intoceph:masterfrom
liewegas:fix-add-osd-out

liewegas commented Feb 22, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

liewegas commented Feb 22, 2021

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants