cephadm: handle adopting offline OSDs#34565
Merged
sebastian-philipp merged 4 commits intoceph:masterfrom Apr 22, 2020
Merged
Conversation
sebastian-philipp
approved these changes
Apr 15, 2020
The current adopt behavior expects OSDs to be online, in order to read /var/lib/ceph/osd/ceph-$ID/fsid. To handle the case where OSDs are offline, this change first checks to see if that file is present, and if not, falls back to calling `ceph-volume lvm list` to see if there's a matching OSD there, and if that doesn't work, it checks /etc/ceph/osd/*.json to see if there's a matching old-style simple OSD present. For LVM OSDs, the only thing we need is the ODS's fsid; the remainer of the adopt procedure "just works", as the various other files in /var/lib/ceph/$FSID/osd.$ID are created by magic anyway when the OSD is activated, so it doesn't matter if they're not present at adoption time. For simple (ceph-disk created) OSDs, we actually need all the files under /var/lib/ceph/osd/ceph-$ID/ to be moved to /var/lib/ceph/$FSID/osd.$ID so if a simple OSD is found, it's mounted first, so the existing move_files() a bit further down around line 3200 continues to work. Fixes: https://tracker.ceph.com/issues/45095 Signed-off-by: Tim Serong <tserong@suse.com>
When adopting OSDs, if a ceph-volume simple service is already disabled (or otherwise missing) the previous implementation would raise an error, thus killing the adopt. Signed-off-by: Tim Serong <tserong@suse.com>
Current behaviour is to only start a newly adopted ceph daemon if it was already running before the adopt. Adding a --force-start option allows the adopt command to start newly adopted daemons that weren't originally running, to save the user having to manually invoke `systemctl start ceph-$FSID@$DAEMMON.$ID`. Signed-off-by: Tim Serong <tserong@suse.com>
In case someone tries to run this again on an already adopted daemon... Signed-off-by: Tim Serong <tserong@suse.com>
Member
Author
|
Minor rework to add AdoptOsd class, and a little bit more cleanup. I want to re-test again, just to make sure, but this should be about right now... |
Member
Author
|
Re-tested adopting online LVM, offline LVM and offline simple OSDs. Also tested adopting offline MON and MGR with --force-start set. I'm pretty happy with this now. |
sebastian-philipp
approved these changes
Apr 17, 2020
mgfritch
reviewed
Apr 17, 2020
mgfritch
approved these changes
Apr 17, 2020
Contributor
mgfritch
left a comment
There was a problem hiding this comment.
just a few minor nits, but otherwise lgtm!
Member
Author
Thanks @mgfritch. Unfortunately I didn't manage to get back to those nits yet, and this PR is now merged. I'll take another look tomorrow and maybe open another PR, or slip them in a separate commit with some other related work. |
Contributor
|
whoopsie. completely missed the open bits. |
tserong
added a commit
to SUSE/ceph
that referenced
this pull request
Apr 23, 2020
This addresses a couple of the comments in ceph#34565 Signed-off-by: Tim Serong <tserong@suse.com>
tserong
added a commit
to SUSE/ceph
that referenced
this pull request
Apr 23, 2020
This addresses a couple of the comments in ceph#34565 Signed-off-by: Tim Serong <tserong@suse.com>
tserong
added a commit
to SUSE/ceph
that referenced
this pull request
Apr 28, 2020
This addresses a couple of the comments in ceph#34565 Signed-off-by: Tim Serong <tserong@suse.com>
sebastian-philipp
pushed a commit
to sebastian-philipp/ceph
that referenced
this pull request
May 21, 2020
This addresses a couple of the comments in ceph#34565 Signed-off-by: Tim Serong <tserong@suse.com> (cherry picked from commit df20182)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The current adopt behavior expects OSDs to be online, in order to read /var/lib/ceph/osd/ceph-$ID/fsid. To handle the case where OSDs are offline, this change first checks to see if that file is present, and if not, falls back to calling
ceph-volume lvm listto see if there's a matching OSD there, and if that doesn't work, it checks /etc/ceph/osd/*.json to see if there's a matching old-style simple OSD present.For LVM OSDs, the only thing we need is the ODS's fsid; the remainer of the adopt procedure "just works", as the various other files in /var/lib/ceph/$FSID/osd.$ID are created by magic anyway when the OSD is activated, so it doesn't matter if they're not present at adoption time.
For simple (ceph-disk created) OSDs, we actually need all the files under /var/lib/ceph/osd/ceph-$ID/ to be moved to /var/lib/ceph/$FSID/osd.$ID so if a simple OSD is found, it's mounted first, so the existing move_files() a bit further down around line 3200 continues to work.
Fixes: https://tracker.ceph.com/issues/45095
Signed-off-by: Tim Serong tserong@suse.com