Skip to content

mgr/cephadm: mgr or mds scale-down should prefer non-active daemons#36485

Merged
sebastian-philipp merged 1 commit intoceph:masterfrom
adk3798:cephadm-44252
Aug 20, 2020
Merged

mgr/cephadm: mgr or mds scale-down should prefer non-active daemons#36485
sebastian-philipp merged 1 commit intoceph:masterfrom
adk3798:cephadm-44252

Conversation

@adk3798
Copy link
Contributor

@adk3798 adk3798 commented Aug 5, 2020

When placing daemons during a mgr/mds scale-down, give preference
to the host with the active daemon so the active daemon is not
picked for removal

When removing daemons during a mgr/mds scale-down, prefer to remove
standby daemons so the active daemon is not killed

Fixes: https://tracker.ceph.com/issues/44252
Signed-off-by: Adam King adking@redhat.com

@adk3798 adk3798 requested a review from a team as a code owner August 5, 2020 21:54
Copy link
Contributor

@sebastian-philipp sebastian-philipp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm missing any tests here. Mind if you create a test in test_scheduling? I really want to avoid this to break at any time in the future.

@adk3798
Copy link
Contributor Author

adk3798 commented Aug 7, 2020

Swapped from passing the get_active_daemon function to the scheduler to having a new field in the DaemonDescription object to mark if a daemon is the active one (only applies to mgr and mds). Right now, I'm setting it in the _check_daemons function in the cephadm module which runs near the end of the serve loop. It was discussed to set the flag in the get_daemons_with_volatile_status function in the inventory file but I don't think this function is called often enough to expect the flag to be set when a scale-down happens.

I still need to make tests.

@mchangir
Copy link
Contributor

mchangir commented Aug 7, 2020

@adk3798
The commit message confused me to no end.
If we are scaling down and we give preference to active daemons, then I'd think that daemons that are active are the ones actually killed. But its the other way round.
I think the phrasing should be:
During scale down of mgr/mds daemons, do not kill active daemons and instead prefer or prioritize standby (non-active) daemons for killing.

I hope this makes sense.

@adk3798
Copy link
Contributor Author

adk3798 commented Aug 7, 2020

@adk3798
The commit message confused me to no end.
If we are scaling down and we give preference to active daemons, then I'd think that daemons that are active are the ones actually killed. But its the other way round.
I think the phrasing should be:
During scale down of mgr/mds daemons, do not kill active daemons and instead prefer or prioritize standby (non-active) daemons for killing.

I hope this makes sense.

The intention with that part of the message was to mirror the line from the ceph tracker "2. make the scheduler prefer active daemons when placing them." but I can see how that can be confusing, especially when the first line of the commit says it will prefer non-active daemons, so I'll change it to speak only about preferring non-active daemons when removing them and leave out language about placing.

@adk3798 adk3798 force-pushed the cephadm-44252 branch 2 times, most recently from 86e4ee7 to 7c0fad6 Compare August 7, 2020 18:50
@adk3798
Copy link
Contributor Author

adk3798 commented Aug 7, 2020

Added some python tests for placing on hosts when one of the daemons has the is_active flag set

@sebastian-philipp
Copy link
Contributor

jenkins test make check

@adk3798 adk3798 force-pushed the cephadm-44252 branch 2 times, most recently from 9bf3424 to 949bf30 Compare August 11, 2020 19:50
if dd.daemon_type in ['grafana', 'iscsi', 'prometheus', 'alertmanager', 'nfs']:
daemons_post[dd.daemon_type].append(dd)

if dd.daemon_type in ['mgr', 'mds']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why only for mgr and mds? We have get_active_daemon also for grafana

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sebastian-philipp The reason I was originally very careful and only allowing mds or mgr daemons here was because I had left get_active_daemon undefined for many of the daemon types. I've refactored it a bit so if it's called for a service that hasn't defined the function it just gets back an empty Daemon Desc and then checks if the daemon_id on the Daemon Desc it gets back matches the daemon that is being checked. Thoughts?

Also, the reasoning here is the same reason I was limiting it to mgr and mds elsewhere. I've marked those conversations as resolved but can go back to them if this system is insufficient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. I'll try to get though QA asap

@adk3798
Copy link
Contributor Author

adk3798 commented Aug 17, 2020

jenkins test make check

When removing daemons during a mgr/mds scale-down, prefer to remove
standby daemons so the active daemon is not killed

Fixes: https://tracker.ceph.com/issues/44252
Signed-off-by: Adam King <adking@redhat.com>
@sebastian-philipp sebastian-philipp added the wip-swagner-testing My Teuthology tests label Aug 18, 2020
sebastian-philipp added a commit to sebastian-philipp/ceph that referenced this pull request Aug 19, 2020
I'm facing a problem that `ceph orch daemon redeploy <our own mgr>` really fails badly:
1. client sends `redeploy` to the mon
2. mon sends `redeploy` to the mgr
3. we synchonously call _create_daemon() with our own mgr. this never completes
4. the mon re-sends the command to the mgr as soon as it starts
5. goto 2.
and we#re in an evil endless loop

I'm now trying to 1. ok-to-stop to always return False in this case and 2. call ok-to-stop from `orch daemon redeploy`
,but this makes some problems, as we then no longer failove the mgr thus never undeploy the active MGR.

Turns out this is really closely related to AdamKm's ceph#36485 (preferring to remove standby daemons instead of active ones),
but that dosn't solve `orch daemon redeploy`. So for now, I think making ok-to-stop always false for our own mgr is the wrong idea
for that I should rely on ceph#36485 instead.

but how do I solve `daemon redeploy`? I think we _have_ to failover before redeploying our own mgr
I mean, self._create_daemon is the last call we ever execute. becuase then we're gone and another MGR takes over

Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
@sebastian-philipp
Copy link
Contributor

@sebastian-philipp sebastian-philipp merged commit 35b0261 into ceph:master Aug 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants