Skip to content

pacific: mgr: add --max <n> to 'osd ok-to-stop' command#39737

Merged
liewegas merged 7 commits intoceph:pacificfrom
liewegas:pr-39455-pacific
Mar 4, 2021
Merged

pacific: mgr: add --max <n> to 'osd ok-to-stop' command#39737
liewegas merged 7 commits intoceph:pacificfrom
liewegas:pr-39455-pacific

Conversation

@liewegas
Copy link
Member

@liewegas liewegas commented Feb 27, 2021

xxhdx1985126 and others added 6 commits February 27, 2021 09:15
Right now, the "ok-to-stop" condition is relatively rigorous, it allows
stopping an osd only if no PG on it is non-active or degraded. But there
are situations in which an OSD is part of a degraded pg and the pg still
still have > min_size complete replicas after the OSD is stopped.

In 9750061, we changed from considering
just acting to using avail_no_missing (OSDs that have no missing objects).
When the projected pg_acting is constructed this way, we can safely compare
to min_size... even for a PG marked degraded.

Fixes: https://tracker.ceph.com/issues/49392
Signed-off-by: Xuehan Xu <xxhdx1985126@gmail.com>
(cherry picked from commit 2f28fc5)
Given and initial (set of) osd(s), if provide up to N OSDs that can be
stopped together without making PGs become unavailable.

This can be used to quickly identify large(r) batches of OSDs that can be
stopped together to (for example) upgrade.

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 722f57d)
Include specifics about which pgs are affect, which pgs prevent us from
being ok to stop, etc.

The primary downside I see here is that a success and failure output will
look more similar to a human user

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 791952c)
Currently in the case where the mon returns a command error code, we print
the error stream and Error ... message but not the command output.  Usually
there isn't any, so we haven't noticed until now, but there is not reason
why shouldn't return both an error code and some output.

Restructure the code so that the error message goes *after* the JSON output,
where it will be a bit more obvious to the user (if the stdout scrolled
the terminal, for instance).  (This is not a change in behavior since
previously we weren't seeing the stdout at all.)

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 9425eee)
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 2e15607)
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 98f1be8)
In 791952c we switched to return JSON
both on success and fail to describe which PGs are affected or are blocking
the ability to stop/restart OSDs.  Do the same for the case where
some PG states are unknown (i.e., just after a mgr restart) so that
the cephadm upgrade process can unconditionally expect a JSON result.

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 2cce165)
@liewegas liewegas merged commit ca0faa0 into ceph:pacific Mar 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants