mgr: add --max <n> to 'osd ok-to-stop' command#39455
Conversation
|
ceph/src/pybind/mgr/cephadm/services/osd.py Lines 354 to 376 in 2588fda |
leseb
left a comment
There was a problem hiding this comment.
Would a --bucket make sense to limit to a failure domain? It will allow the tool to stop at a given leaf.
|
@xxhdx1985126 is this something you are after in #39335 ? |
I thought about this, but I don't think the hierarchy levels are relevant. You might have really big hosts and still only want to restart 10-20 osds at a time. Or, you might have smaller hosts, and want to restart lots of osds across several hosts (but within the same rack). You probably don't want to restart an entire rack of OSDs at once, though. |
Um... not quite, we need ok-to-stop to allow stopping osds when pgs of replicated pools are already degraded |
|
The main question I have is whether we should expand the JSON output to have more structure, e.g. or or similar? I'm not a big fan of the weird mix of stderr-for-humans and stdout-for-machines |
|
in any case it probably should return the osds that are ok to stop 😄 |
Right now, the "ok-to-stop" condition is relatively rigorous, it allows stopping an osd only if no PG on it is non-active or degraded. But there are situations in which an OSD is part of a degraded pg and the pg still still have > min_size complete replicas after the OSD is stopped. In 9750061, we changed from considering just acting to using avail_no_missing (OSDs that have no missing objects). When the projected pg_acting is constructed this way, we can safely compare to min_size... even for a PG marked degraded. Fixes: https://tracker.ceph.com/issues/49392 Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
Given and initial (set of) osd(s), if provide up to N OSDs that can be stopped together without making PGs become unavailable. This can be used to quickly identify large(r) batches of OSDs that can be stopped together to (for example) upgrade. Signed-off-by: Sage Weil <sage@newdream.net>
66565a2 to
5a5e5e5
Compare
|
success: failure: |
Include specifics about which pgs are affect, which pgs prevent us from being ok to stop, etc. The primary downside I see here is that a success and failure output will look more similar to a human user Signed-off-by: Sage Weil <sage@newdream.net>
5a5e5e5 to
42ef3d4
Compare
Currently in the case where the mon returns a command error code, we print the error stream and Error ... message but not the command output. Usually there isn't any, so we haven't noticed until now, but there is not reason why shouldn't return both an error code and some output. Restructure the code so that the error message goes *after* the JSON output, where it will be a bit more obvious to the user (if the stdout scrolled the terminal, for instance). (This is not a change in behavior since previously we weren't seeing the stdout at all.) Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
53c2646 to
98f1be8
Compare
Given and initial (set of) osd(s), if provide up to N OSDs that can be stopped together without making PGs become unavailable. This can be used to quickly identify large(r) batches of OSDs that can be stopped together to (for example) upgrade.
Adjust the command output to dump structured JSON so that we can include
Note that this required some CLI changes:
For example, a successful return:
and a failed command: