Skip to content

mon/OSDMonitor: do not mark newly created OSDs OUT#39631

Merged
liewegas merged 3 commits intoceph:masterfrom
liewegas:fix-add-osd-out
Mar 1, 2021
Merged

mon/OSDMonitor: do not mark newly created OSDs OUT#39631
liewegas merged 3 commits intoceph:masterfrom
liewegas:fix-add-osd-out

Conversation

@liewegas
Copy link
Member

This behavior appears to date back all the way to the 'osd new' command
in c9e6cac, and for 'osd create' from
118f081. The first commit has no
real explanation, but it presumably inherited it from the second. That
second commit, though, says

if we are creating an osd which has the same id as a previously
removed 'in' osd, we should not mark this newly created osd as 'in'

This isn't actually a good idea, however. If we are creating (or reusing)
a new OSD id, the OSD that starts up will have no data. So no matter what
there will be a data migration from the before state to the final state.
If we mark the osd OUT when the osd id is allocated but before the OSD
starts up, we'll create a middle state where PGs are mapped to the id and
then remapped (due to out) and a bunch of peering, and possibly some
data transfer will actually happen before the osd starts up and marks
itself in.

Instead, we have two cases:

  1. If we are reusing a DESTROYED osd id, we should leave the in/out
    state the way it was. This way we still go straight from the before
    state to the after state (the osd will mark itself in when it starts up).

  2. If we are allocating a new id in do_osd_create(), we should mark the
    osd IN. That way, there the inbetween state will be that the OSD is
    down--not that it exists but is out and PGs are mapped to some other
    intermediate location.

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

Currently, new OSDs are marked OUT.

This behavior appears to date back all the way to the 'osd new' command
in c9e6cac, and for 'osd create' from
118f081.  The first commit has no
real explanation, but it presumably inherited it from the second. That
second commit, though, says

    if we are creating an osd which has the same id as a previously
    removed 'in' osd, we should not mark this newly created osd as 'in'

This isn't actually a good idea, however.  If we are creating (or reusing)
a new OSD id, the OSD that starts up will have no data.  So no matter what
there will be a data migration from the before state to the final state.
If we mark the osd OUT when the osd id is allocated but before the OSD
starts up, we'll create a middle state where PGs are mapped to the id (by
virtue of the CRUSH weight) and then remapped away (due to out), creating
a middle state where a bunch of PGs will repeer and maybe data will move.

Instead, we have two cases:

1) If we are reusing a DESTROYED osd id, we should leave the in/out
state the way it was.  This way we still go straight from the before
state to the after state (the osd will mark itself in when it starts up).

2) If we are allocating a new id in do_osd_create(), we want the OSD
to be IN, so there is no middle state.  Unfortunately, we have to work
around apply_incremental() being obnoxious here: it's sloppy implementation
will implicitly set EXISTS by virtue of new_osd_weight (the mark IN part)
before applying the osd_state XOR, so be careful!  (This behavior is
mirrored by the Linux kernel implementation too, thankfully.)

Signed-off-by: Sage Weil <sage@newdream.net>
If we allocate a new OSD, don't raise a health alert about it.

Signed-off-by: Sage Weil <sage@newdream.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants