mgr/cephadm: add count-per-host to PlacementSpec, and add support to scheduler by liewegas · Pull Request #39979 · ceph/ceph

liewegas · 2021-03-10T00:21:49Z

for now, does not consider ports.. that is next
refactors and simplifies the scheduling code to use fewer helpers and be more explicit
drop the "avoid even numbers of monitors" logic that is IMO unnecessary and complicated

The next step is to handle port assignments...

Fixes: https://tracker.ceph.com/issues/47145

sebastian-philipp

nothing in here that raises red flags.

sebastian-philipp · 2021-03-10T11:32:24Z

src/pybind/mgr/cephadm/schedule.py

        # If none of the above and also no <count>
-        if self.spec.placement.count is None:
+        elif self.spec.placement.count is not None:
+            # backward compatibility: consider an empty placements to be the same pattern = *


The comments are outdated now

Suggested change

# If none of the above and also no <count>

if self.spec.placement.count is None:

elif self.spec.placement.count is not None:

# backward compatibility: consider an empty placements to be the same pattern = *

elif self.spec.placement.count is not None:

# consider an empty placements to be the same pattern = *

fixed, along iwth bare count-per-host:N handling

sebastian-philipp · 2021-03-10T11:39:50Z

src/pybind/mgr/cephadm/schedule.py

        self.filter_new_host = filter_new_host
        self.service_name = spec.service_name()
        self.daemons = daemons
+        self.allow_colo = allow_colo


Suggested change

self.allow_colo = allow_colo

self.allow_colo = self.mgr.cephadm_services[spec.service_type].allow_colo()`

Avoids a new parameter?

there's not self.mgr here AFAICS

sebastian-philipp · 2021-03-10T11:44:42Z

src/pybind/mgr/cephadm/schedule.py

-        # gen seed off of self.spec to make shuffling deterministic
-        seed = hash(self.spec.service_name())
-        # shuffle for pseudo random selection
-        random.Random(seed).shuffle(host_pool)


Can you remove the SimpleScheduler class please?

It's now a super complicated way of calling host_pool[:count]

SimpleScheduler's docstring is now misleading.

Alternative would be to leave it as it is, as it doesn't really harm anyone by this point.

yeah, i was about to remove it, but i think there is still possibly a role for pluggable schedulers, but closer to the the get_placement() stage. it's silly as-is, though!

src/pybind/mgr/cephadm/schedule.py

jmolmo

It is needed to document the new "count-per-host" attribute in the placement specification point

Ceph works just fine with an even number of monitors. Upside: more copies of critical cluster data Downside: we can tolerate the same number of down mons as N-1, and now we are slightly more likely to have a failing mon because we have 1 more that might fail. On balance it's not clear that have one fewer mon is any better, so avoid the confusion and complexity of second-guessing ourselves. Signed-off-by: Sage Weil <sage@newdream.net>

Cannot be combined with 'count'. Signed-off-by: Sage Weil <sage@newdream.net>

If we are told to deploy multiple instances on a host, do it. Signed-off-by: Sage Weil <sage@newdream.net>

In the no-count cases, our job is simple: we have a set of hosts specified via some other means (label, filter, explicit list) and simply need to do N instances for each of those hosts. Signed-off-by: Sage Weil <sage@newdream.net>

…xplicit placement At least for now we don't have a reasonable way to stamp out N placements when we're also constraining the name/network/ip--not in the general case. Signed-off-by: Sage Weil <sage@newdream.net>

Signed-off-by: Sage Weil <sage@newdream.net>

- simpler - many/most callers already have the daemon list, so we save ourselves duplicated effort Signed-off-by: Sage Weil <sage@newdream.net>

Signed-off-by: Sage Weil <sage@newdream.net>

Otherwise we may end out randomly doubling up on some hosts and none on others (when we have more than one placement per host). Signed-off-by: Sage Weil <sage@newdream.net>

For certain daemon types, we can deploy more than one per host (mds, rbd-mirror, rgw). Signed-off-by: Sage Weil <sage@newdream.net>

We already have to examine existing daemons to choose the placements. There is no reason to make the caller call another method to (re)calculate what the net additions and removals are. Signed-off-by: Sage Weil <sage@newdream.net>

It no longer does anything except slice the array. Signed-off-by: Sage Weil <sage@newdream.net>

Signed-off-by: Sage Weil <sage@newdream.net>

… host_pattern I think this is better for the same reason we made PlacementSpec() not mean 'all hosts' by default. If you really want N daemons for every host in the cluster, be specific with 'count-per-host:2 *'. Signed-off-by: Sage Weil <sage@newdream.net>

- Rename get_host_selection_size() to get_target_size() since the host part of the name was a bit misleading - Take count-per-host into consideration. Signed-off-by: Sage Weil <sage@newdream.net>

sebastian-philipp · 2021-03-11T11:07:36Z

~~I think this is the cause of~~

~~https://pulpito.ceph.com/sage-2021-03-11_01:59:56-rados:cephadm-wip-sage-testing-2021-03-10-1754-distro-basic-smithi/5954929/~~

Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/serve.py", line 461, in _apply_all_services
    if self._apply_service(spec):
  File "/usr/share/ceph/mgr/cephadm/serve.py", line 595, in _apply_service
    forcename=slot.name)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 548, in get_unique_name
    f'name {daemon_type}.{forcename} already in use')
orchestrator._interface.OrchestratorValidationError: name mgr.x already in use

tchaikov · 2021-03-12T12:03:57Z

2021-03-12T06:06:20.405 INFO:journalctl@ceph.mon.a.smithi012.stdout:Mar 12 06:06:19 smithi012 conmon[23630]: audit
2021-03-12T06:06:20.405 INFO:journalctl@ceph.mon.a.smithi012.stdout:Mar 12 06:06:19 smithi012 conmon[23630]: 2021-03-12T06:06:
2021-03-12T06:06:20.405 INFO:journalctl@ceph.mon.a.smithi012.stdout:Mar 12 06:06:19 smithi012 conmon[23630]: 19.737138+0000 mon.a (mon.
2021-03-12T06:06:20.406 INFO:journalctl@ceph.mon.a.smithi012.stdout:Mar 12 06:06:19 smithi012 conmon[23630]: 0) 100 : audit [INF]
2021-03-12T06:06:20.406 INFO:journalctl@ceph.mon.a.smithi012.stdout:Mar 12 06:06:19 smithi012 conmon[23630]: from='mgr.14144 172.21.15.12:0/2372735735' entity='mgr.a' cmd=[{"prefix":"config rm","who":"mgr","name":"mgr/rbd_support/a/trash_purge_schedule"}]: dispatch
2021-03-12T06:06:21.404 INFO:journalctl@ceph.mon.a.smithi012.stdout:Mar 12 06:06:20 smithi012 conmon[23630]: cluster 2021-03-12T06:06:19.575845+0000 mgr.a (mgr.14144) 30
2021-03-12T06:06:21.405 INFO:journalctl@ceph.mon.a.smithi012.stdout:Mar 12 06:06:20 smithi012 conmon[23630]:  : cluster [DBG] pgmap v23: 0 pgs: ; 0 B data, 0 B used, 0 B / 0 B avail
2021-03-12T06:06:22.404 INFO:journalctl@ceph.mon.a.smithi012.stdout:Mar 12 06:06:21 smithi012 conmon[23630]: cluster 2021-03-12T06:06:21.576319+0000 mgr.a (mgr.14144
2021-03-12T06:06:22.405 INFO:journalctl@ceph.mon.a.smithi012.stdout:Mar 12 06:06:21 smithi012 conmon[23630]: ) 31 : cluster [DBG] pgmap v24: 0 pgs: ; 0 B data, 0 B used, 0 B / 0 B avail
2021-03-12T06:06:24.904 INFO:journalctl@ceph.mon.a.smithi012.stdout:Mar 12 06:06:24 smithi012 conmon[23630]: cluster 2021-03-12T06:06:23.576707+0000 mgr.a (mgr.14144) 32
2021-03-12T06:06:24.905 INFO:journalctl@ceph.mon.a.smithi012.stdout:Mar 12 06:06:24 smithi012 conmon[23630]:  : cluster [DBG] pgmap v25: 0 pgs: ; 0 B data, 0 B used, 0 B / 0 B avail
2021-03-12T06:06:27.404 INFO:journalctl@ceph.mon.a.smithi012.stdout:Mar 12 06:06:26 smithi012 conmon[23630]: cluster 2021-03-12T06:06:
2021-03-12T06:06:27.405 INFO:journalctl@ceph.mon.a.smithi012.stdout:Mar 12 06:06:26 smithi012 conmon[23630]: 25.577088+0000 mgr.a (mgr.14144) 33 : cluster [DBG] pgmap v26: 0 pgs: ; 0 B data, 0 B used, 0 B / 0 B avail
2021-03-12T06:06:28.749 INFO:teuthology.orchestra.run.smithi175.stderr:Error: error looking up supplemental groups for container 52cb72e8c6d8c41774a697dc79fd491ae5e7dea82a67a32b1eed1fbe6912cfb5: Unable to find group disk
2021-03-12T06:06:28.779 DEBUG:teuthology.orchestra.run:got remote process result: 126

in https://pulpito.ceph.com/kchai-2021-03-12_05:42:44-rados-wip-kefu-testing-2021-03-12-1106-distro-basic-smithi/5958482/

not sure if it's related.

i also ran into the failure of orchestrator._interface.OrchestratorValidationError: name mgr.a already in use in the 2nd run above. see https://pulpito.ceph.com/kchai-2021-03-11_16:25:48-rados-wip-kefu-testing-2021-03-11-2158-distro-basic-gibba/5956186/ . but failed to reproduce it in the 3rd run.

liewegas requested a review from a team as a code owner March 10, 2021 00:21

github-actions bot added cephadm orchestrator pybind labels Mar 10, 2021

liewegas changed the title ~~<!-- Thank you for opening a pull request! Here are some tips on creating a well formatted contribution.~~ mgr/cephadm: add count-per-host to PlacementSpec, and add support to scheduler Mar 10, 2021

sebastian-philipp approved these changes Mar 10, 2021

View reviewed changes

liewegas force-pushed the cephadm-count-per-host branch from ac1c68b to 4990342 Compare March 10, 2021 13:32

jmolmo approved these changes Mar 10, 2021

View reviewed changes

liewegas added 17 commits March 10, 2021 14:34

python-common: add count-per-host to PlacementSpec

04755fa

Cannot be combined with 'count'. Signed-off-by: Sage Weil <sage@newdream.net>

mgr/cephadm: adjust deployment logic to allow multiple daemons per host

7c083af

If we are told to deploy multiple instances on a host, do it. Signed-off-by: Sage Weil <sage@newdream.net>

mgr/cephadm/schedule: respect count-per-host

d8d808c

In the no-count cases, our job is simple: we have a set of hosts specified via some other means (label, filter, explicit list) and simply need to do N instances for each of those hosts. Signed-off-by: Sage Weil <sage@newdream.net>

mgr/cephadm/schedule: move host filtering into get_candidates()

1bf09d1

Signed-off-by: Sage Weil <sage@newdream.net>

mgr/cephadm: use local var

6244a85

Signed-off-by: Sage Weil <sage@newdream.net>

mgr/cephadm/schedule: pass daemons, not get_daemons_func

3201516

- simpler - many/most callers already have the daemon list, so we save ourselves duplicated effort Signed-off-by: Sage Weil <sage@newdream.net>

mgr/cephadm/services/cephadmservice: fix typo

eb932ee

Signed-off-by: Sage Weil <sage@newdream.net>

mgr/cephadm/schedule: pass per-type allow_colo to the scheduler

927694b

Signed-off-by: Sage Weil <sage@newdream.net>

mgr/cephadm/schedule: shuffle candidates, not final placements

8931958

Otherwise we may end out randomly doubling up on some hosts and none on others (when we have more than one placement per host). Signed-off-by: Sage Weil <sage@newdream.net>

mgr/cephadm/schedule: allow colocation of certain daemon types

6d0098d

For certain daemon types, we can deploy more than one per host (mds, rbd-mirror, rgw). Signed-off-by: Sage Weil <sage@newdream.net>

mgr/cephadm/schedule: remove Scheduler abstraction (for now at least)

36ef062

It no longer does anything except slice the array. Signed-off-by: Sage Weil <sage@newdream.net>

mgr/cephadm: handle bare 'count-per-host:NNN', fix comments

b481b7d

Signed-off-by: Sage Weil <sage@newdream.net>

python-common: fix PlacementSpec target size method

2904c5e

- Rename get_host_selection_size() to get_target_size() since the host part of the name was a bit misleading - Take count-per-host into consideration. Signed-off-by: Sage Weil <sage@newdream.net>

liewegas force-pushed the cephadm-count-per-host branch from 4990342 to 2904c5e Compare March 10, 2021 22:37

tchaikov added the wip-kefu-testing label Mar 11, 2021

tchaikov removed the wip-kefu-testing label Mar 12, 2021

liewegas merged commit 91b741b into ceph:master Mar 15, 2021

liewegas mentioned this pull request Mar 15, 2021

pacific: cephadm: Batch backport March (2) #40135

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mgr/cephadm: add count-per-host to PlacementSpec, and add support to scheduler#39979

mgr/cephadm: add count-per-host to PlacementSpec, and add support to scheduler#39979
liewegas merged 17 commits intoceph:masterfrom
liewegas:cephadm-count-per-host

liewegas commented Mar 10, 2021 •

edited by sebastian-philipp

Loading

Uh oh!

sebastian-philipp left a comment

Uh oh!

sebastian-philipp Mar 10, 2021

Uh oh!

liewegas Mar 10, 2021

Uh oh!

sebastian-philipp Mar 10, 2021

Uh oh!

liewegas Mar 10, 2021

Uh oh!

sebastian-philipp Mar 10, 2021

Uh oh!

liewegas Mar 10, 2021

Uh oh!

Uh oh!

jmolmo left a comment

Uh oh!

sebastian-philipp commented Mar 11, 2021 •

edited

Loading

Uh oh!

tchaikov commented Mar 12, 2021 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	self.allow_colo = allow_colo
	self.allow_colo = self.mgr.cephadm_services[spec.service_type].allow_colo()`

Conversation

liewegas commented Mar 10, 2021 • edited by sebastian-philipp Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sebastian-philipp left a comment

Choose a reason for hiding this comment

Uh oh!

sebastian-philipp Mar 10, 2021

Choose a reason for hiding this comment

Uh oh!

liewegas Mar 10, 2021

Choose a reason for hiding this comment

Uh oh!

sebastian-philipp Mar 10, 2021

Choose a reason for hiding this comment

Uh oh!

liewegas Mar 10, 2021

Choose a reason for hiding this comment

Uh oh!

sebastian-philipp Mar 10, 2021

Choose a reason for hiding this comment

Uh oh!

liewegas Mar 10, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jmolmo left a comment

Choose a reason for hiding this comment

Uh oh!

sebastian-philipp commented Mar 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tchaikov commented Mar 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

liewegas commented Mar 10, 2021 •

edited by sebastian-philipp

Loading

sebastian-philipp commented Mar 11, 2021 •

edited

Loading

tchaikov commented Mar 12, 2021 •

edited

Loading