Skip to content

mgr: add --max <n> to 'osd ok-to-stop' command#39455

Merged
liewegas merged 6 commits intoceph:masterfrom
liewegas:ok-to-stop-max
Feb 27, 2021
Merged

mgr: add --max <n> to 'osd ok-to-stop' command#39455
liewegas merged 6 commits intoceph:masterfrom
liewegas:ok-to-stop-max

Conversation

@liewegas
Copy link
Member

@liewegas liewegas commented Feb 12, 2021

Given and initial (set of) osd(s), if provide up to N OSDs that can be stopped together without making PGs become unavailable. This can be used to quickly identify large(r) batches of OSDs that can be stopped together to (for example) upgrade.

Adjust the command output to dump structured JSON so that we can include

  • which osd(s) are safe to stop together (since it might include more than what was provided on the command line)
  • which pgs would become degraded or become more degraded
  • on failure, which pgs would become inactive, are already inactive, or are creating/deleting

Note that this required some CLI changes:

  • ceph command now prints stdout (the json output) when the exit code is non-zero

For example, a successful return:

$ bin/ceph osd ok-to-stop 5 --max 20

{"ok_to_stop":true,"osds":[0,1,5],"num_ok_pgs":163,"num_not_ok_pgs":0,"ok_become_degraded":["1.0","1.1","1.2","1.3","1.4","1.8","1.9","1.a","1.b","1.c","1.e","1.f","1.10","1.12","1.14","1.15","1.17","1.18","1.19","1.1a","1.1b","1.1c","1.1d","1.1f","1.20","1.21","1.22","1.23","1.27","1.28","1.29","1.2a","1.2c","1.2d","1.2e","1.2f","1.30","1.31","1.32","1.36","1.37","1.39","1.3b","1.3c","1.3d","1.3e","1.3f","1.40","1.41","1.42","1.44","1.45","1.46","1.47","1.48","1.49","1.4a","1.4b","1.4e","1.50","1.52","1.54","1.57","1.59","1.5a","1.5e","1.60","1.62","1.63","1.67","1.69","1.6e","1.6f","1.70","1.71","1.72","1.73","1.75","1.76","1.79","1.7a","1.7c","1.82","1.83","1.84","1.86","1.87","1.88","1.8a","1.8b","1.8c","1.8e","1.8f","1.90","1.91","1.95","1.97","1.98","1.99","1.9a","1.9b","1.9e","1.9f","1.a0","1.a2","1.a3","1.a4","1.a5","1.a6","1.a7","1.a8","1.ae","1.af","1.b0","1.b3","1.b4","1.b6","1.b7","1.b8","1.b9","1.bb","1.bc","1.bd","1.be","1.c1","1.c5","1.c6","1.c7","1.c8","1.ca","1.ce","1.cf","1.d0","1.d4","1.d5","1.d7","1.d8","1.d9","1.db","1.e0","1.e1","1.e3","1.e5","1.e6","1.e7","1.e8","1.e9","1.ea","1.eb","1.ec","1.ef","1.f1","1.f2","1.f3","1.f5","1.f6","1.f8","1.fa","1.fb","1.fc","1.fd","1.fe","1.ff"]}

and a failed command:

$ bin/ceph osd ok-to-stop 5 6 7 8 --max 20

{"ok_to_stop":false,"osds":[5,6,7,8],"num_ok_pgs":176,"num_not_ok_pgs":9,"bad_become_inactive":["1.10","1.16","1.21","1.27","1.3e","1.5f","1.a7","1.c1","1.e9"],"ok_become_degraded":["1.0","1.1","1.2","1.4","1.5","1.6","1.7","1.8","1.9","1.a","1.d","1.11","1.12","1.13","1.15","1.18","1.1a","1.1e","1.20","1.22","1.23","1.24","1.26","1.28","1.29","1.2a","1.2c","1.2d","1.2f","1.31","1.32","1.33","1.35","1.36","1.37","1.38","1.3a","1.3c","1.3d","1.3f","1.40","1.41","1.42","1.43","1.44","1.45","1.46","1.47","1.48","1.49","1.4b","1.4c","1.4e","1.4f","1.51","1.52","1.53","1.54","1.55","1.57","1.59","1.5b","1.5c","1.5d","1.63","1.65","1.68","1.69","1.6a","1.6c","1.6d","1.6e","1.6f","1.70","1.71","1.73","1.74","1.75","1.79","1.7a","1.7b","1.7d","1.80","1.82","1.83","1.84","1.85","1.88","1.8b","1.8c","1.8d","1.8e","1.8f","1.90","1.92","1.93","1.94","1.96","1.99","1.9a","1.9b","1.9c","1.9d","1.a0","1.a1","1.a4","1.a5","1.a6","1.a9","1.aa","1.ab","1.ac","1.ae","1.af","1.b2","1.b3","1.b4","1.b5","1.b6","1.b7","1.b8","1.b9","1.bb","1.bc","1.bd","1.be","1.bf","1.c0","1.c3","1.c4","1.c5","1.c6","1.c7","1.c8","1.ca","1.cc","1.cd","1.ce","1.d0","1.d2","1.d3","1.d4","1.d6","1.d7","1.d8","1.d9","1.da","1.db","1.dc","1.dd","1.de","1.df","1.e0","1.e1","1.e2","1.e5","1.e6","1.e8","1.ea","1.ec","1.ee","1.ef","1.f0","1.f1","1.f3","1.f5","1.f6","1.f7","1.f8","1.f9","1.fa","1.fb","1.fc","1.fd","1.fe","1.ff"]}
Error EBUSY: unsafe to stop osd(s)
unsafe to stop osd(s)

@sebastian-philipp
Copy link
Contributor

sebastian-philipp commented Feb 16, 2021

def find_osd_stop_threshold(self, osds: List["OSD"]) -> Optional[List["OSD"]]:
"""
Cut osd_id list in half until it's ok-to-stop
:param osds: list of osd_ids
:return: list of ods_ids that can be stopped at once
"""
if not osds:
return []
while not self.ok_to_stop(osds):
if len(osds) <= 1:
# can't even stop one OSD, aborting
self.mgr.log.info(
"Can't even stop one OSD. Cluster is probably busy. Retrying later..")
return []
# This potentially prolongs the global wait time.
self.mgr.event.wait(1)
# splitting osd_ids in half until ok_to_stop yields success
# maybe popping ids off one by one is better here..depends on the cluster size I guess..
# There's a lot of room for micro adjustments here
osds = osds[len(osds) // 2:]
return osds

Copy link
Member

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a --bucket make sense to limit to a failure domain? It will allow the tool to stop at a given leaf.

@tchaikov
Copy link
Contributor

@xxhdx1985126 is this something you are after in #39335 ?

@liewegas
Copy link
Member Author

Would a --bucket make sense to limit to a failure domain? It will allow the tool to stop at a given leaf.

I thought about this, but I don't think the hierarchy levels are relevant. You might have really big hosts and still only want to restart 10-20 osds at a time. Or, you might have smaller hosts, and want to restart lots of osds across several hosts (but within the same rack). You probably don't want to restart an entire rack of OSDs at once, though. --max NUM seems sufficient for this... with something like 10 or 20. That'll take a while on larger clusters, but I think that's okay. And we can make it a tunable if you really want to make things go quickly...

@xxhdx1985126
Copy link
Contributor

@xxhdx1985126 is this something you are after in #39335 ?

Um... not quite, we need ok-to-stop to allow stopping osds when pgs of replicated pools are already degraded

@liewegas
Copy link
Member Author

The main question I have is whether we should expand the JSON output to have more structure, e.g.

{
   "ok-to-stop": true,
   "osds": [ ... ],
}

or

{
   "ok-to-stop": false,
   "osds_considered": [...],
   "pgs_going_inactive": [...],
}

or similar? I'm not a big fan of the weird mix of stderr-for-humans and stdout-for-machines

@sebastian-philipp
Copy link
Contributor

in any case it probably should return the osds that are ok to stop 😄

xxhdx1985126 and others added 2 commits February 20, 2021 09:52
Right now, the "ok-to-stop" condition is relatively rigorous, it allows
stopping an osd only if no PG on it is non-active or degraded. But there
are situations in which an OSD is part of a degraded pg and the pg still
still have > min_size complete replicas after the OSD is stopped.

In 9750061, we changed from considering
just acting to using avail_no_missing (OSDs that have no missing objects).
When the projected pg_acting is constructed this way, we can safely compare
to min_size... even for a PG marked degraded.

Fixes: https://tracker.ceph.com/issues/49392
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
Given and initial (set of) osd(s), if provide up to N OSDs that can be
stopped together without making PGs become unavailable.

This can be used to quickly identify large(r) batches of OSDs that can be
stopped together to (for example) upgrade.

Signed-off-by: Sage Weil <sage@newdream.net>
@liewegas
Copy link
Member Author

success:

$ bin/ceph osd ok-to-stop 5 -f json --max 20

{"ok_to_stop":true,"osds":[0,1,5],"num_ok_pgs":163,"num_not_ok_pgs":0,"ok_become_degraded":["1.0","1.1","1.2","1.3","1.4","1.8","1.9","1.a","1.b","1.c","1.e","1.f","1.10","1.12","1.14","1.15","1.17","1.18","1.19","1.1a","1.1b","1.1c","1.1d","1.1f","1.20","1.21","1.22","1.23","1.27","1.28","1.29","1.2a","1.2c","1.2d","1.2e","1.2f","1.30","1.31","1.32","1.36","1.37","1.39","1.3b","1.3c","1.3d","1.3e","1.3f","1.40","1.41","1.42","1.44","1.45","1.46","1.47","1.48","1.49","1.4a","1.4b","1.4e","1.50","1.52","1.54","1.57","1.59","1.5a","1.5e","1.60","1.62","1.63","1.67","1.69","1.6e","1.6f","1.70","1.71","1.72","1.73","1.75","1.76","1.79","1.7a","1.7c","1.82","1.83","1.84","1.86","1.87","1.88","1.8a","1.8b","1.8c","1.8e","1.8f","1.90","1.91","1.95","1.97","1.98","1.99","1.9a","1.9b","1.9e","1.9f","1.a0","1.a2","1.a3","1.a4","1.a5","1.a6","1.a7","1.a8","1.ae","1.af","1.b0","1.b3","1.b4","1.b6","1.b7","1.b8","1.b9","1.bb","1.bc","1.bd","1.be","1.c1","1.c5","1.c6","1.c7","1.c8","1.ca","1.ce","1.cf","1.d0","1.d4","1.d5","1.d7","1.d8","1.d9","1.db","1.e0","1.e1","1.e3","1.e5","1.e6","1.e7","1.e8","1.e9","1.ea","1.eb","1.ec","1.ef","1.f1","1.f2","1.f3","1.f5","1.f6","1.f8","1.fa","1.fb","1.fc","1.fd","1.fe","1.ff"]}

failure:

$ bin/ceph osd ok-to-stop 5 6 7 8 -f json --max 20

{"ok_to_stop":false,"osds":[5,6,7,8],"num_ok_pgs":176,"num_not_ok_pgs":9,"bad_become_inactive":["1.10","1.16","1.21","1.27","1.3e","1.5f","1.a7","1.c1","1.e9"],"ok_become_degraded":["1.0","1.1","1.2","1.4","1.5","1.6","1.7","1.8","1.9","1.a","1.d","1.11","1.12","1.13","1.15","1.18","1.1a","1.1e","1.20","1.22","1.23","1.24","1.26","1.28","1.29","1.2a","1.2c","1.2d","1.2f","1.31","1.32","1.33","1.35","1.36","1.37","1.38","1.3a","1.3c","1.3d","1.3f","1.40","1.41","1.42","1.43","1.44","1.45","1.46","1.47","1.48","1.49","1.4b","1.4c","1.4e","1.4f","1.51","1.52","1.53","1.54","1.55","1.57","1.59","1.5b","1.5c","1.5d","1.63","1.65","1.68","1.69","1.6a","1.6c","1.6d","1.6e","1.6f","1.70","1.71","1.73","1.74","1.75","1.79","1.7a","1.7b","1.7d","1.80","1.82","1.83","1.84","1.85","1.88","1.8b","1.8c","1.8d","1.8e","1.8f","1.90","1.92","1.93","1.94","1.96","1.99","1.9a","1.9b","1.9c","1.9d","1.a0","1.a1","1.a4","1.a5","1.a6","1.a9","1.aa","1.ab","1.ac","1.ae","1.af","1.b2","1.b3","1.b4","1.b5","1.b6","1.b7","1.b8","1.b9","1.bb","1.bc","1.bd","1.be","1.bf","1.c0","1.c3","1.c4","1.c5","1.c6","1.c7","1.c8","1.ca","1.cc","1.cd","1.ce","1.d0","1.d2","1.d3","1.d4","1.d6","1.d7","1.d8","1.d9","1.da","1.db","1.dc","1.dd","1.de","1.df","1.e0","1.e1","1.e2","1.e5","1.e6","1.e8","1.ea","1.ec","1.ee","1.ef","1.f0","1.f1","1.f3","1.f5","1.f6","1.f7","1.f8","1.f9","1.fa","1.fb","1.fc","1.fd","1.fe","1.ff"]}
Error EBUSY: unsafe to stop osd(s)
unsafe to stop osd(s)

Include specifics about which pgs are affect, which pgs prevent us from
being ok to stop, etc.

The primary downside I see here is that a success and failure output will
look more similar to a human user

Signed-off-by: Sage Weil <sage@newdream.net>
@github-actions github-actions bot added the tests label Feb 22, 2021
Currently in the case where the mon returns a command error code, we print
the error stream and Error ... message but not the command output.  Usually
there isn't any, so we haven't noticed until now, but there is not reason
why shouldn't return both an error code and some output.

Restructure the code so that the error message goes *after* the JSON output,
where it will be a bit more obvious to the user (if the stdout scrolled
the terminal, for instance).  (This is not a change in behavior since
previously we weren't seeing the stdout at all.)

Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants