Skip to content

cephadm: handle adopting offline OSDs#34565

Merged
sebastian-philipp merged 4 commits intoceph:masterfrom
SUSE:wip-fix-45095
Apr 22, 2020
Merged

cephadm: handle adopting offline OSDs#34565
sebastian-philipp merged 4 commits intoceph:masterfrom
SUSE:wip-fix-45095

Conversation

@tserong
Copy link
Member

@tserong tserong commented Apr 15, 2020

The current adopt behavior expects OSDs to be online, in order to read /var/lib/ceph/osd/ceph-$ID/fsid. To handle the case where OSDs are offline, this change first checks to see if that file is present, and if not, falls back to calling ceph-volume lvm list to see if there's a matching OSD there, and if that doesn't work, it checks /etc/ceph/osd/*.json to see if there's a matching old-style simple OSD present.

For LVM OSDs, the only thing we need is the ODS's fsid; the remainer of the adopt procedure "just works", as the various other files in /var/lib/ceph/$FSID/osd.$ID are created by magic anyway when the OSD is activated, so it doesn't matter if they're not present at adoption time.

For simple (ceph-disk created) OSDs, we actually need all the files under /var/lib/ceph/osd/ceph-$ID/ to be moved to /var/lib/ceph/$FSID/osd.$ID so if a simple OSD is found, it's mounted first, so the existing move_files() a bit further down around line 3200 continues to work.

Fixes: https://tracker.ceph.com/issues/45095
Signed-off-by: Tim Serong tserong@suse.com

@tserong tserong requested a review from a team as a code owner April 15, 2020 09:42
@tserong tserong changed the title cephadm: handle adopting offline OSDSs cephadm: handle adopting offline OSDs Apr 15, 2020
@tserong tserong added the DNM label Apr 16, 2020
tserong added 4 commits April 16, 2020 14:55
The current adopt behavior expects OSDs to be online, in order to read
/var/lib/ceph/osd/ceph-$ID/fsid.  To handle the case where OSDs
are offline, this change first checks to see if that file is present,
and if not, falls back to calling `ceph-volume lvm list` to see if
there's a matching OSD there, and if that doesn't work, it checks
/etc/ceph/osd/*.json to see if there's a matching old-style simple
OSD present.

For LVM OSDs, the only thing we need is the ODS's fsid; the remainer
of the adopt procedure "just works", as the various other files
in /var/lib/ceph/$FSID/osd.$ID are created by magic anyway when the
OSD is activated, so it doesn't matter if they're not present at
adoption time.

For simple (ceph-disk created) OSDs, we actually need all the files under
/var/lib/ceph/osd/ceph-$ID/ to be moved to /var/lib/ceph/$FSID/osd.$ID
so if a simple OSD is found, it's mounted first, so the existing
move_files() a bit further down around line 3200 continues to work.

Fixes: https://tracker.ceph.com/issues/45095
Signed-off-by: Tim Serong <tserong@suse.com>
When adopting OSDs, if a ceph-volume simple service is already disabled
(or otherwise missing) the previous implementation would raise an error,
thus killing the adopt.

Signed-off-by: Tim Serong <tserong@suse.com>
Current behaviour is to only start a newly adopted ceph daemon if it was
already running before the adopt.  Adding a --force-start option allows
the adopt command to start newly adopted daemons that weren't originally
running, to save the user having to manually invoke `systemctl start
ceph-$FSID@$DAEMMON.$ID`.

Signed-off-by: Tim Serong <tserong@suse.com>
In case someone tries to run this again on an already adopted daemon...

Signed-off-by: Tim Serong <tserong@suse.com>
@tserong
Copy link
Member Author

tserong commented Apr 16, 2020

Minor rework to add AdoptOsd class, and a little bit more cleanup. I want to re-test again, just to make sure, but this should be about right now...

@tserong tserong removed the DNM label Apr 17, 2020
@tserong
Copy link
Member Author

tserong commented Apr 17, 2020

Re-tested adopting online LVM, offline LVM and offline simple OSDs. Also tested adopting offline MON and MGR with --force-start set. I'm pretty happy with this now.

Copy link
Contributor

@mgfritch mgfritch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a few minor nits, but otherwise lgtm!

@sebastian-philipp sebastian-philipp added the wip-swagner-testing My Teuthology tests label Apr 20, 2020
@sebastian-philipp
Copy link
Contributor

@sebastian-philipp sebastian-philipp merged commit 2e23a54 into ceph:master Apr 22, 2020
@tserong
Copy link
Member Author

tserong commented Apr 22, 2020

just a few minor nits, but otherwise lgtm!

Thanks @mgfritch. Unfortunately I didn't manage to get back to those nits yet, and this PR is now merged. I'll take another look tomorrow and maybe open another PR, or slip them in a separate commit with some other related work.

@sebastian-philipp
Copy link
Contributor

whoopsie. completely missed the open bits.

@tserong tserong deleted the wip-fix-45095 branch April 23, 2020 06:34
tserong added a commit to SUSE/ceph that referenced this pull request Apr 23, 2020
This addresses a couple of the comments in
ceph#34565

Signed-off-by: Tim Serong <tserong@suse.com>
tserong added a commit to SUSE/ceph that referenced this pull request Apr 23, 2020
This addresses a couple of the comments in
ceph#34565

Signed-off-by: Tim Serong <tserong@suse.com>
tserong added a commit to SUSE/ceph that referenced this pull request Apr 28, 2020
This addresses a couple of the comments in
ceph#34565

Signed-off-by: Tim Serong <tserong@suse.com>
sebastian-philipp pushed a commit to sebastian-philipp/ceph that referenced this pull request May 21, 2020
This addresses a couple of the comments in
ceph#34565

Signed-off-by: Tim Serong <tserong@suse.com>
(cherry picked from commit df20182)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants