mgr: add --max <n> to 'osd ok-to-stop' command by liewegas · Pull Request #39455 · ceph/ceph

liewegas · 2021-02-12T19:33:34Z

Given and initial (set of) osd(s), if provide up to N OSDs that can be stopped together without making PGs become unavailable. This can be used to quickly identify large(r) batches of OSDs that can be stopped together to (for example) upgrade.

Adjust the command output to dump structured JSON so that we can include

which osd(s) are safe to stop together (since it might include more than what was provided on the command line)
which pgs would become degraded or become more degraded
on failure, which pgs would become inactive, are already inactive, or are creating/deleting

Note that this required some CLI changes:

ceph command now prints stdout (the json output) when the exit code is non-zero

For example, a successful return:

$ bin/ceph osd ok-to-stop 5 --max 20

{"ok_to_stop":true,"osds":[0,1,5],"num_ok_pgs":163,"num_not_ok_pgs":0,"ok_become_degraded":["1.0","1.1","1.2","1.3","1.4","1.8","1.9","1.a","1.b","1.c","1.e","1.f","1.10","1.12","1.14","1.15","1.17","1.18","1.19","1.1a","1.1b","1.1c","1.1d","1.1f","1.20","1.21","1.22","1.23","1.27","1.28","1.29","1.2a","1.2c","1.2d","1.2e","1.2f","1.30","1.31","1.32","1.36","1.37","1.39","1.3b","1.3c","1.3d","1.3e","1.3f","1.40","1.41","1.42","1.44","1.45","1.46","1.47","1.48","1.49","1.4a","1.4b","1.4e","1.50","1.52","1.54","1.57","1.59","1.5a","1.5e","1.60","1.62","1.63","1.67","1.69","1.6e","1.6f","1.70","1.71","1.72","1.73","1.75","1.76","1.79","1.7a","1.7c","1.82","1.83","1.84","1.86","1.87","1.88","1.8a","1.8b","1.8c","1.8e","1.8f","1.90","1.91","1.95","1.97","1.98","1.99","1.9a","1.9b","1.9e","1.9f","1.a0","1.a2","1.a3","1.a4","1.a5","1.a6","1.a7","1.a8","1.ae","1.af","1.b0","1.b3","1.b4","1.b6","1.b7","1.b8","1.b9","1.bb","1.bc","1.bd","1.be","1.c1","1.c5","1.c6","1.c7","1.c8","1.ca","1.ce","1.cf","1.d0","1.d4","1.d5","1.d7","1.d8","1.d9","1.db","1.e0","1.e1","1.e3","1.e5","1.e6","1.e7","1.e8","1.e9","1.ea","1.eb","1.ec","1.ef","1.f1","1.f2","1.f3","1.f5","1.f6","1.f8","1.fa","1.fb","1.fc","1.fd","1.fe","1.ff"]}

and a failed command:

$ bin/ceph osd ok-to-stop 5 6 7 8 --max 20

{"ok_to_stop":false,"osds":[5,6,7,8],"num_ok_pgs":176,"num_not_ok_pgs":9,"bad_become_inactive":["1.10","1.16","1.21","1.27","1.3e","1.5f","1.a7","1.c1","1.e9"],"ok_become_degraded":["1.0","1.1","1.2","1.4","1.5","1.6","1.7","1.8","1.9","1.a","1.d","1.11","1.12","1.13","1.15","1.18","1.1a","1.1e","1.20","1.22","1.23","1.24","1.26","1.28","1.29","1.2a","1.2c","1.2d","1.2f","1.31","1.32","1.33","1.35","1.36","1.37","1.38","1.3a","1.3c","1.3d","1.3f","1.40","1.41","1.42","1.43","1.44","1.45","1.46","1.47","1.48","1.49","1.4b","1.4c","1.4e","1.4f","1.51","1.52","1.53","1.54","1.55","1.57","1.59","1.5b","1.5c","1.5d","1.63","1.65","1.68","1.69","1.6a","1.6c","1.6d","1.6e","1.6f","1.70","1.71","1.73","1.74","1.75","1.79","1.7a","1.7b","1.7d","1.80","1.82","1.83","1.84","1.85","1.88","1.8b","1.8c","1.8d","1.8e","1.8f","1.90","1.92","1.93","1.94","1.96","1.99","1.9a","1.9b","1.9c","1.9d","1.a0","1.a1","1.a4","1.a5","1.a6","1.a9","1.aa","1.ab","1.ac","1.ae","1.af","1.b2","1.b3","1.b4","1.b5","1.b6","1.b7","1.b8","1.b9","1.bb","1.bc","1.bd","1.be","1.bf","1.c0","1.c3","1.c4","1.c5","1.c6","1.c7","1.c8","1.ca","1.cc","1.cd","1.ce","1.d0","1.d2","1.d3","1.d4","1.d6","1.d7","1.d8","1.d9","1.da","1.db","1.dc","1.dd","1.de","1.df","1.e0","1.e1","1.e2","1.e5","1.e6","1.e8","1.ea","1.ec","1.ee","1.ef","1.f0","1.f1","1.f3","1.f5","1.f6","1.f7","1.f8","1.f9","1.fa","1.fb","1.fc","1.fd","1.fe","1.ff"]}
Error EBUSY: unsafe to stop osd(s)
unsafe to stop osd(s)

sebastian-philipp · 2021-02-16T16:54:52Z

ceph/src/pybind/mgr/cephadm/services/osd.py

Lines 354 to 376 in 2588fda

    
               def find_osd_stop_threshold(self, osds: List["OSD"]) -> Optional[List["OSD"]]: 
        
                   """ 
        
                   Cut osd_id list in half until it's ok-to-stop 
        
                   :param osds: list of osd_ids 
        
                   :return: list of ods_ids that can be stopped at once 
        
                   """ 
        
                   if not osds: 
        
                       return [] 
        
                   while not self.ok_to_stop(osds): 
        
                       if len(osds) <= 1: 
        
                           # can't even stop one OSD, aborting 
        
                           self.mgr.log.info( 
        
                               "Can't even stop one OSD. Cluster is probably busy. Retrying later..") 
        
                           return [] 
        
                       # This potentially prolongs the global wait time. 
        
                       self.mgr.event.wait(1) 
        
                       # splitting osd_ids in half until ok_to_stop yields success 
        
                       # maybe popping ids off one by one is better here..depends on the cluster size I guess.. 
        
                       # There's a lot of room for micro adjustments here 
        
                       osds = osds[len(osds) // 2:] 
        
                   return osds

leseb

Would a --bucket make sense to limit to a failure domain? It will allow the tool to stop at a given leaf.

tchaikov · 2021-02-18T15:17:13Z

@xxhdx1985126 is this something you are after in #39335 ?

liewegas · 2021-02-18T15:30:11Z

Would a --bucket make sense to limit to a failure domain? It will allow the tool to stop at a given leaf.

I thought about this, but I don't think the hierarchy levels are relevant. You might have really big hosts and still only want to restart 10-20 osds at a time. Or, you might have smaller hosts, and want to restart lots of osds across several hosts (but within the same rack). You probably don't want to restart an entire rack of OSDs at once, though. --max NUM seems sufficient for this... with something like 10 or 20. That'll take a while on larger clusters, but I think that's okay. And we can make it a tunable if you really want to make things go quickly...

src/mgr/DaemonServer.cc

xxhdx1985126 · 2021-02-19T03:01:33Z

@xxhdx1985126 is this something you are after in #39335 ?

Um... not quite, we need ok-to-stop to allow stopping osds when pgs of replicated pools are already degraded

liewegas · 2021-02-19T14:49:29Z

The main question I have is whether we should expand the JSON output to have more structure, e.g.

{
   "ok-to-stop": true,
   "osds": [ ... ],
}

or

{
   "ok-to-stop": false,
   "osds_considered": [...],
   "pgs_going_inactive": [...],
}

or similar? I'm not a big fan of the weird mix of stderr-for-humans and stdout-for-machines

sebastian-philipp · 2021-02-19T18:19:56Z

in any case it probably should return the osds that are ok to stop 😄

Right now, the "ok-to-stop" condition is relatively rigorous, it allows stopping an osd only if no PG on it is non-active or degraded. But there are situations in which an OSD is part of a degraded pg and the pg still still have > min_size complete replicas after the OSD is stopped. In 9750061, we changed from considering just acting to using avail_no_missing (OSDs that have no missing objects). When the projected pg_acting is constructed this way, we can safely compare to min_size... even for a PG marked degraded. Fixes: https://tracker.ceph.com/issues/49392 Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>

Given and initial (set of) osd(s), if provide up to N OSDs that can be stopped together without making PGs become unavailable. This can be used to quickly identify large(r) batches of OSDs that can be stopped together to (for example) upgrade. Signed-off-by: Sage Weil <sage@newdream.net>

liewegas · 2021-02-20T17:30:48Z

success:

$ bin/ceph osd ok-to-stop 5 -f json --max 20

{"ok_to_stop":true,"osds":[0,1,5],"num_ok_pgs":163,"num_not_ok_pgs":0,"ok_become_degraded":["1.0","1.1","1.2","1.3","1.4","1.8","1.9","1.a","1.b","1.c","1.e","1.f","1.10","1.12","1.14","1.15","1.17","1.18","1.19","1.1a","1.1b","1.1c","1.1d","1.1f","1.20","1.21","1.22","1.23","1.27","1.28","1.29","1.2a","1.2c","1.2d","1.2e","1.2f","1.30","1.31","1.32","1.36","1.37","1.39","1.3b","1.3c","1.3d","1.3e","1.3f","1.40","1.41","1.42","1.44","1.45","1.46","1.47","1.48","1.49","1.4a","1.4b","1.4e","1.50","1.52","1.54","1.57","1.59","1.5a","1.5e","1.60","1.62","1.63","1.67","1.69","1.6e","1.6f","1.70","1.71","1.72","1.73","1.75","1.76","1.79","1.7a","1.7c","1.82","1.83","1.84","1.86","1.87","1.88","1.8a","1.8b","1.8c","1.8e","1.8f","1.90","1.91","1.95","1.97","1.98","1.99","1.9a","1.9b","1.9e","1.9f","1.a0","1.a2","1.a3","1.a4","1.a5","1.a6","1.a7","1.a8","1.ae","1.af","1.b0","1.b3","1.b4","1.b6","1.b7","1.b8","1.b9","1.bb","1.bc","1.bd","1.be","1.c1","1.c5","1.c6","1.c7","1.c8","1.ca","1.ce","1.cf","1.d0","1.d4","1.d5","1.d7","1.d8","1.d9","1.db","1.e0","1.e1","1.e3","1.e5","1.e6","1.e7","1.e8","1.e9","1.ea","1.eb","1.ec","1.ef","1.f1","1.f2","1.f3","1.f5","1.f6","1.f8","1.fa","1.fb","1.fc","1.fd","1.fe","1.ff"]}

failure:

$ bin/ceph osd ok-to-stop 5 6 7 8 -f json --max 20

{"ok_to_stop":false,"osds":[5,6,7,8],"num_ok_pgs":176,"num_not_ok_pgs":9,"bad_become_inactive":["1.10","1.16","1.21","1.27","1.3e","1.5f","1.a7","1.c1","1.e9"],"ok_become_degraded":["1.0","1.1","1.2","1.4","1.5","1.6","1.7","1.8","1.9","1.a","1.d","1.11","1.12","1.13","1.15","1.18","1.1a","1.1e","1.20","1.22","1.23","1.24","1.26","1.28","1.29","1.2a","1.2c","1.2d","1.2f","1.31","1.32","1.33","1.35","1.36","1.37","1.38","1.3a","1.3c","1.3d","1.3f","1.40","1.41","1.42","1.43","1.44","1.45","1.46","1.47","1.48","1.49","1.4b","1.4c","1.4e","1.4f","1.51","1.52","1.53","1.54","1.55","1.57","1.59","1.5b","1.5c","1.5d","1.63","1.65","1.68","1.69","1.6a","1.6c","1.6d","1.6e","1.6f","1.70","1.71","1.73","1.74","1.75","1.79","1.7a","1.7b","1.7d","1.80","1.82","1.83","1.84","1.85","1.88","1.8b","1.8c","1.8d","1.8e","1.8f","1.90","1.92","1.93","1.94","1.96","1.99","1.9a","1.9b","1.9c","1.9d","1.a0","1.a1","1.a4","1.a5","1.a6","1.a9","1.aa","1.ab","1.ac","1.ae","1.af","1.b2","1.b3","1.b4","1.b5","1.b6","1.b7","1.b8","1.b9","1.bb","1.bc","1.bd","1.be","1.bf","1.c0","1.c3","1.c4","1.c5","1.c6","1.c7","1.c8","1.ca","1.cc","1.cd","1.ce","1.d0","1.d2","1.d3","1.d4","1.d6","1.d7","1.d8","1.d9","1.da","1.db","1.dc","1.dd","1.de","1.df","1.e0","1.e1","1.e2","1.e5","1.e6","1.e8","1.ea","1.ec","1.ee","1.ef","1.f0","1.f1","1.f3","1.f5","1.f6","1.f7","1.f8","1.f9","1.fa","1.fb","1.fc","1.fd","1.fe","1.ff"]}
Error EBUSY: unsafe to stop osd(s)
unsafe to stop osd(s)

Include specifics about which pgs are affect, which pgs prevent us from being ok to stop, etc. The primary downside I see here is that a success and failure output will look more similar to a human user Signed-off-by: Sage Weil <sage@newdream.net>

src/mgr/DaemonServer.cc

qa/standalone/misc/ok-to-stop.sh

Currently in the case where the mon returns a command error code, we print the error stream and Error ... message but not the command output. Usually there isn't any, so we haven't noticed until now, but there is not reason why shouldn't return both an error code and some output. Restructure the code so that the error message goes *after* the JSON output, where it will be a bit more obvious to the user (if the stdout scrolled the terminal, for instance). (This is not a change in behavior since previously we weren't seeing the stdout at all.) Signed-off-by: Sage Weil <sage@newdream.net>

Signed-off-by: Sage Weil <sage@newdream.net>

github-actions bot added core mgr labels Feb 12, 2021

liewegas requested review from leseb, neha-ojha and tchaikov February 12, 2021 19:33

liewegas added wip-sage-testing needs-qa wip-sage2-testing and removed wip-sage-testing labels Feb 16, 2021

leseb reviewed Feb 17, 2021

View reviewed changes

neha-ojha reviewed Feb 19, 2021

View reviewed changes

src/mgr/DaemonServer.cc Outdated Show resolved Hide resolved

xxhdx1985126 and others added 2 commits February 20, 2021 09:52

liewegas force-pushed the ok-to-stop-max branch from 66565a2 to 5a5e5e5 Compare February 20, 2021 17:30

xxhdx1985126 mentioned this pull request Feb 21, 2021

mgr: relax ok-to-stop condition #39335

Closed

3 tasks

liewegas force-pushed the ok-to-stop-max branch from 5a5e5e5 to 42ef3d4 Compare February 22, 2021 18:48

neha-ojha reviewed Feb 22, 2021

View reviewed changes

src/mgr/DaemonServer.cc Show resolved Hide resolved

github-actions bot added the tests label Feb 22, 2021

jdurgin approved these changes Feb 23, 2021

View reviewed changes

neha-ojha approved these changes Feb 23, 2021

View reviewed changes

tchaikov reviewed Feb 23, 2021

View reviewed changes

qa/standalone/misc/ok-to-stop.sh Show resolved Hide resolved

github-actions bot added the documentation label Feb 23, 2021

tchaikov approved these changes Feb 23, 2021

View reviewed changes

liewegas added wip-sage-testing and removed wip-sage2-testing labels Feb 25, 2021

liewegas added 3 commits February 26, 2021 13:11

src/test/osd/safe-to-destroy: adjust test

2e15607

Signed-off-by: Sage Weil <sage@newdream.net>

doc/man/8/ceph: document --max option

98f1be8

Signed-off-by: Sage Weil <sage@newdream.net>

liewegas force-pushed the ok-to-stop-max branch from 53c2646 to 98f1be8 Compare February 26, 2021 19:11

liewegas merged commit 5e197a2 into ceph:master Feb 27, 2021

liewegas mentioned this pull request Feb 27, 2021

pacific: mgr: add --max <n> to 'osd ok-to-stop' command #39737

Merged

liewegas deleted the ok-to-stop-max branch February 27, 2021 15:16

smithfarm mentioned this pull request Mar 9, 2021

octopus: mgr: relax osd ok-to-stop condition on degraded pgs #39887

Merged

smithfarm mentioned this pull request Apr 8, 2021

nautilus: mgr: add --max <n> to 'osd ok-to-stop' command #40676

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mgr: add --max <n> to 'osd ok-to-stop' command#39455

mgr: add --max <n> to 'osd ok-to-stop' command#39455
liewegas merged 6 commits intoceph:masterfrom
liewegas:ok-to-stop-max

liewegas commented Feb 12, 2021 •

edited

Loading

Uh oh!

sebastian-philipp commented Feb 16, 2021 •

edited

Loading

Uh oh!

leseb left a comment

Uh oh!

tchaikov commented Feb 18, 2021

Uh oh!

liewegas commented Feb 18, 2021

Uh oh!

Uh oh!

xxhdx1985126 commented Feb 19, 2021

Uh oh!

liewegas commented Feb 19, 2021

Uh oh!

sebastian-philipp commented Feb 19, 2021

Uh oh!

liewegas commented Feb 20, 2021

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

liewegas commented Feb 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sebastian-philipp commented Feb 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leseb left a comment

Choose a reason for hiding this comment

Uh oh!

tchaikov commented Feb 18, 2021

Uh oh!

liewegas commented Feb 18, 2021

Uh oh!

Uh oh!

xxhdx1985126 commented Feb 19, 2021

Uh oh!

liewegas commented Feb 19, 2021

Uh oh!

sebastian-philipp commented Feb 19, 2021

Uh oh!

liewegas commented Feb 20, 2021

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

liewegas commented Feb 12, 2021 •

edited

Loading

sebastian-philipp commented Feb 16, 2021 •

edited

Loading