mon/OSDMonitor: do not mark newly created OSDs OUT#39631
Merged
liewegas merged 3 commits intoceph:masterfrom Mar 1, 2021
Merged
mon/OSDMonitor: do not mark newly created OSDs OUT#39631liewegas merged 3 commits intoceph:masterfrom
liewegas merged 3 commits intoceph:masterfrom
Conversation
jdurgin
approved these changes
Feb 23, 2021
Currently, new OSDs are marked OUT. This behavior appears to date back all the way to the 'osd new' command in c9e6cac, and for 'osd create' from 118f081. The first commit has no real explanation, but it presumably inherited it from the second. That second commit, though, says if we are creating an osd which has the same id as a previously removed 'in' osd, we should not mark this newly created osd as 'in' This isn't actually a good idea, however. If we are creating (or reusing) a new OSD id, the OSD that starts up will have no data. So no matter what there will be a data migration from the before state to the final state. If we mark the osd OUT when the osd id is allocated but before the OSD starts up, we'll create a middle state where PGs are mapped to the id (by virtue of the CRUSH weight) and then remapped away (due to out), creating a middle state where a bunch of PGs will repeer and maybe data will move. Instead, we have two cases: 1) If we are reusing a DESTROYED osd id, we should leave the in/out state the way it was. This way we still go straight from the before state to the after state (the osd will mark itself in when it starts up). 2) If we are allocating a new id in do_osd_create(), we want the OSD to be IN, so there is no middle state. Unfortunately, we have to work around apply_incremental() being obnoxious here: it's sloppy implementation will implicitly set EXISTS by virtue of new_osd_weight (the mark IN part) before applying the osd_state XOR, so be careful! (This behavior is mirrored by the Linux kernel implementation too, thankfully.) Signed-off-by: Sage Weil <sage@newdream.net>
0d88aed to
c8f021c
Compare
Signed-off-by: Sage Weil <sage@newdream.net>
If we allocate a new OSD, don't raise a health alert about it. Signed-off-by: Sage Weil <sage@newdream.net>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This behavior appears to date back all the way to the 'osd new' command
in c9e6cac, and for 'osd create' from
118f081. The first commit has no
real explanation, but it presumably inherited it from the second. That
second commit, though, says
This isn't actually a good idea, however. If we are creating (or reusing)
a new OSD id, the OSD that starts up will have no data. So no matter what
there will be a data migration from the before state to the final state.
If we mark the osd OUT when the osd id is allocated but before the OSD
starts up, we'll create a middle state where PGs are mapped to the id and
then remapped (due to out) and a bunch of peering, and possibly some
data transfer will actually happen before the osd starts up and marks
itself in.
Instead, we have two cases:
If we are reusing a DESTROYED osd id, we should leave the in/out
state the way it was. This way we still go straight from the before
state to the after state (the osd will mark itself in when it starts up).
If we are allocating a new id in do_osd_create(), we should mark the
osd IN. That way, there the inbetween state will be that the OSD is
down--not that it exists but is out and PGs are mapped to some other
intermediate location.
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume tox