mgr/cephadm: make scheduling make sense by sebastian-philipp · Pull Request #34633 · ceph/ceph

sebastian-philipp · 2020-04-19T14:04:40Z

Depends on

python-common: only validate host_pattern if present #34860

Consider this placement:

placement:
  hosts:
  - myhost
  count: 2

Previously, if there was already a daemon running on myhost, cephadm would schedule another daemon on a random other host.

Now, will always only schedule one daemon on myhost.

Open questions:

How can we merge this without braking existing clusters?

TODO:

introduce a migration from old to new scheduler to change the placement specs to reflect the reality.

Also fixed:

HostAssignment(
                spec=ServiceSpec('mon', placement=PlacementSpec(
                    hosts=['host1'],
                    count=3,
                )),
                get_hosts_func=lambda _: ['host1, 'host2'],
                get_daemons_func=lambda _: []).place()
# returns
['host1', 'host1', 'host2']

Checklist

References tracker ticket
Updates documentation if necessary
Includes tests for new functionality or reproducer for bug

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard backend
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox

jschmid1 · 2020-04-20T15:35:32Z

discussion here: https://pad.ceph.com/p/orchestration-weekly

sebastian-philipp · 2020-05-06T21:28:44Z

ping @jschmid1

jschmid1

this needs a closer review. consider this as a first round of reviews

src/pybind/mgr/cephadm/migrations.py

jschmid1 · 2020-05-07T14:16:36Z

src/pybind/mgr/cephadm/migrations.py

+            )
+
+            new_spec = ServiceSpec.from_json(spec.to_json())
+            new_spec.placement = new_placement


I can also see an issue where a user would apply a cluster.yaml only when changing a single service. Since we converted the placement_glob(or pattern) to an explicit placement spec, they probably have to export and overwrite their existing cluster.yaml.

I think this should be recommended somehow.

Right, that's a big problem but I didn't have a better answer.

Having previews for all services would be a first step I think.. If placements would be changed completely we could atleast raise a warning before applying it.

both schedulers behave very similar when it comes to assigning hosts. It's just the old assigned more hosts than the new one.

I'm still struggling with this concept, but I suppose it's similar to k8s nodeName ...
https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodename

yes. but what's the alternative? what if the user wrongly created a placement like

service_type: mon placement: label: mymons count=5

but only a single host was labled as mymons? should we simply remove all the mons? I'm also against being more elaborate here as this increases the likelihood of bugs. I'd be really in favor of keeping this as simple as possible to stay safe.

src/pybind/mgr/cephadm/migrations.py

src/pybind/mgr/cephadm/schedule.py

sebastian-philipp · 2020-05-13T21:13:14Z

green. http://pulpito.ceph.com/swagner-2020-05-13_11:57:27-rados-wip-swagner3-testing-2020-05-13-1056-distro-basic-smithi/

src/pybind/mgr/cephadm/migrations.py

mgfritch · 2020-05-14T01:23:25Z

src/pybind/mgr/cephadm/migrations.py

+            )
+
+            new_spec = ServiceSpec.from_json(spec.to_json())
+            new_spec.placement = new_placement


I'm still struggling with this concept, but I suppose it's similar to k8s nodeName ...
https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodename

src/pybind/mgr/cephadm/migrations.py

spoiler: it doesn't. Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>

Like, ```yaml placement: hosts: - myhost count: 3 ``` will always only schedule one daemon on `myhost`. Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>

Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>

New scheduler that takes PlacementSpec as the bound and not as recommendation. Which means, we have to make sure, we're not removing any daemons directly after upgrading to the new scheduler. There is a potential race here: 1. user updates his spec to remove daemons 2. mgr gets upgrades to new scheduler, before the old scheduler removed the daemon 3. now, we're converting the spec to explicit placement, thus reverting (1.) I think this is ok. Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>

Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>

sebastian-philipp · 2020-06-03T12:29:55Z

ping @jschmid1 + @mgfritch wdyt?

sebastian-philipp · 2020-06-08T14:58:22Z

green: http://pulpito.ceph.com/swagner-2020-06-08_13:53:34-rados:cephadm-wip-swagner-testing-2020-06-08-1252-distro-basic-smithi/

LenzGr · 2020-06-10T10:01:54Z

Unfortunately this change broke Ceph Dashboard: https://tracker.ceph.com/issues/45963
(the failing ceph dashboard backend API tests on this PR should have been a warning sign)

votdev · 2020-06-10T10:26:15Z

src/pybind/mgr/cephadm/module.py

+        {
+            'name': 'migration_current',
+            'type': 'int',
+            'default': None,


Noneis not an integer, thus it will break code that relies on the configured type. It would be better to choose a value of int, e.g. -1 or 0 here.
The regression introduced by this PR is https://tracker.ceph.com/issues/45963.

hm.

When looking at the code:

ceph/src/pybind/mgr/mgr_module.py

Lines 643 to 655 in 11ddc9c

def get_module_option(self, key, default=None):

"""

Retrieve the value of a persistent configuration setting

:param str key:

:param default: the default value of the config if it is not found

:return: str

"""

r = self._ceph_get_module_option(key)

if r is None:

return self.MODULE_OPTION_DEFAULTS.get(key, default)

else:

return r

having a default that is arbitrary does seems to work find. Having the dashboard treat the values differently than the mgr_module.py seems odd to me.

votdev · 2020-06-10T10:31:30Z

doc/cephadm/concepts.rst

+``myfs`` across the cluster.
+
+Then, in case there are less than three daemons deployed on the candidate 
+hosts, cephadm will then then randomly choose hosts for deploying new daemons.


s/will then then randomly/will then randomly/

sebastian-philipp added the cephadm label Apr 19, 2020

sebastian-philipp requested a review from liewegas April 19, 2020 14:04

sebastian-philipp requested a review from a team as a code owner April 19, 2020 14:04

sebastian-philipp requested a review from jschmid1 April 20, 2020 09:09

sebastian-philipp changed the title ~~mgr/cephadm: verify explicit placement actually works.~~ mgr/cephadm: make scheduling make sense Apr 20, 2020

sebastian-philipp added the needs-review label Apr 20, 2020

sebastian-philipp changed the title ~~mgr/cephadm: make scheduling make sense~~ [WIP]: mgr/cephadm: make scheduling make sense Apr 20, 2020

sebastian-philipp mentioned this pull request Apr 20, 2020

[wip] mgr/cephadm: Add host draining #34617

Closed

4 tasks

sebastian-philipp force-pushed the cephadm-total-scheduler branch 2 times, most recently from f4d6669 to 29a8399 Compare April 21, 2020 16:43

sebastian-philipp added DNM and removed needs-review labels Apr 24, 2020

sebastian-philipp force-pushed the cephadm-total-scheduler branch 2 times, most recently from 457ce7a to 30785bb Compare May 6, 2020 13:52

sebastian-philipp mentioned this pull request May 6, 2020

pybind/mgr: properly mock the mon store for pytest #34926

Merged

3 tasks

sebastian-philipp force-pushed the cephadm-total-scheduler branch from 30785bb to 2e13612 Compare May 6, 2020 16:27

sebastian-philipp mentioned this pull request May 6, 2020

mgr/cephadm: move HostAssignment to new module #34932

Merged

3 tasks

sebastian-philipp force-pushed the cephadm-total-scheduler branch from 2e13612 to cf1e104 Compare May 6, 2020 21:27

sebastian-philipp removed the DNM label May 6, 2020

sebastian-philipp changed the title ~~[WIP]: mgr/cephadm: make scheduling make sense~~ mgr/cephadm: make scheduling make sense May 6, 2020

sebastian-philipp added the needs-review label May 7, 2020

jschmid1 reviewed May 7, 2020

View reviewed changes

sebastian-philipp force-pushed the cephadm-total-scheduler branch from cf1e104 to c1d9dc9 Compare May 8, 2020 07:53

jschmid1 reviewed May 8, 2020

View reviewed changes

src/pybind/mgr/cephadm/schedule.py Outdated Show resolved Hide resolved

jschmid1 reviewed May 8, 2020

View reviewed changes

src/pybind/mgr/cephadm/schedule.py Show resolved Hide resolved

jschmid1 reviewed May 8, 2020

View reviewed changes

src/pybind/mgr/cephadm/schedule.py Show resolved Hide resolved

jschmid1 reviewed May 8, 2020

View reviewed changes

src/pybind/mgr/cephadm/schedule.py Outdated Show resolved Hide resolved

jschmid1 reviewed May 8, 2020

View reviewed changes

src/pybind/mgr/cephadm/schedule.py Outdated Show resolved Hide resolved

sebastian-philipp added the wip-swagner-testing My Teuthology tests label May 13, 2020

mgfritch reviewed May 14, 2020

View reviewed changes

sebastian-philipp added pending-discussion needs-doc and removed needs-qa needs-review wip-swagner-testing My Teuthology tests labels May 14, 2020

sebastian-philipp force-pushed the cephadm-total-scheduler branch from 22c5e58 to 3a04af6 Compare May 20, 2020 08:33

sebastian-philipp added 6 commits June 3, 2020 11:59

mgr/cephadm: verify explicit placement actually works.

b08fd9d

spoiler: it doesn't. Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>

mgr/cephadm: make scheduling make sense

b7bb389

Like, ```yaml placement: hosts: - myhost count: 3 ``` will always only schedule one daemon on `myhost`. Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>

mgr/cephadm: add testing with daemons

2d3a5a2

Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>

mgr/cephadm: add testing with labels and hostpatterns

eae4707

Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>

mgr/orch: scheduler: handle co-located daemons

beb80c0

Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>

mgr/cephadm: serve(): refactor _refresh_hosts_and_daemons

671f03b

Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>

sebastian-philipp force-pushed the cephadm-total-scheduler branch from 3a04af6 to a3dfaa0 Compare June 3, 2020 10:18

sebastian-philipp added 3 commits June 3, 2020 14:21

mgr/cephadm: test scheduler migration

f873d49

Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>

doc/cephadm: add Cephadm scheduler section

b307021

Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>

sebastian-philipp force-pushed the cephadm-total-scheduler branch from a3dfaa0 to b307021 Compare June 3, 2020 12:22

sebastian-philipp added wip-swagner-testing My Teuthology tests and removed needs-doc labels Jun 5, 2020

jschmid1 approved these changes Jun 5, 2020

View reviewed changes

sebastian-philipp merged commit 3a757a4 into ceph:master Jun 9, 2020

votdev reviewed Jun 10, 2020

View reviewed changes

sebastian-philipp mentioned this pull request Jun 15, 2020

octopus: cephadm batch backport June (3) #35568

Merged

	def get_module_option(self, key, default=None):
	"""
	Retrieve the value of a persistent configuration setting

	:param str key:
	:param default: the default value of the config if it is not found
	:return: str
	"""
	r = self._ceph_get_module_option(key)
	if r is None:
	return self.MODULE_OPTION_DEFAULTS.get(key, default)
	else:
	return r

Conversation

sebastian-philipp commented Apr 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

jschmid1 commented Apr 20, 2020

Uh oh!

sebastian-philipp commented May 6, 2020

Uh oh!

jschmid1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sebastian-philipp commented May 13, 2020

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sebastian-philipp commented Jun 3, 2020

Uh oh!

sebastian-philipp commented Jun 8, 2020

Uh oh!

LenzGr commented Jun 10, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sebastian-philipp commented Apr 19, 2020 •

edited

Loading