Skip to content

mgr/cephadm: add count-per-host to PlacementSpec, and add support to scheduler#39979

Merged
liewegas merged 17 commits intoceph:masterfrom
liewegas:cephadm-count-per-host
Mar 15, 2021
Merged

mgr/cephadm: add count-per-host to PlacementSpec, and add support to scheduler#39979
liewegas merged 17 commits intoceph:masterfrom
liewegas:cephadm-count-per-host

Conversation

@liewegas
Copy link
Member

@liewegas liewegas commented Mar 10, 2021

  • for now, does not consider ports.. that is next
  • refactors and simplifies the scheduling code to use fewer helpers and be more explicit
  • drop the "avoid even numbers of monitors" logic that is IMO unnecessary and complicated

The next step is to handle port assignments...

Fixes: https://tracker.ceph.com/issues/47145

@liewegas liewegas requested a review from a team as a code owner March 10, 2021 00:21
@liewegas liewegas changed the title <!-- Thank you for opening a pull request! Here are some tips on creating a well formatted contribution. mgr/cephadm: add count-per-host to PlacementSpec, and add support to scheduler Mar 10, 2021
Copy link
Contributor

@sebastian-philipp sebastian-philipp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nothing in here that raises red flags.

Comment on lines +212 to +214
# If none of the above and also no <count>
if self.spec.placement.count is None:
elif self.spec.placement.count is not None:
# backward compatibility: consider an empty placements to be the same pattern = *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments are outdated now

Suggested change
# If none of the above and also no <count>
if self.spec.placement.count is None:
elif self.spec.placement.count is not None:
# backward compatibility: consider an empty placements to be the same pattern = *
elif self.spec.placement.count is not None:
# consider an empty placements to be the same pattern = *

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, along iwth bare count-per-host:N handling

self.filter_new_host = filter_new_host
self.service_name = spec.service_name()
self.daemons = daemons
self.allow_colo = allow_colo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.allow_colo = allow_colo
self.allow_colo = self.mgr.cephadm_services[spec.service_type].allow_colo()`

Avoids a new parameter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's not self.mgr here AFAICS

Comment on lines -47 to -50
# gen seed off of self.spec to make shuffling deterministic
seed = hash(self.spec.service_name())
# shuffle for pseudo random selection
random.Random(seed).shuffle(host_pool)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the SimpleScheduler class please?

  1. It's now a super complicated way of calling host_pool[:count]
  2. SimpleScheduler's docstring is now misleading.

Alternative would be to leave it as it is, as it doesn't really harm anyone by this point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i was about to remove it, but i think there is still possibly a role for pluggable schedulers, but closer to the the get_placement() stage. it's silly as-is, though!

@liewegas liewegas force-pushed the cephadm-count-per-host branch from ac1c68b to 4990342 Compare March 10, 2021 13:32
Copy link
Member

@jmolmo jmolmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is needed to document the new "count-per-host" attribute in the placement specification point

liewegas added 17 commits March 10, 2021 14:34
Ceph works just fine with an even number of monitors.

Upside: more copies of critical cluster data
Downside: we can tolerate the same number of down mons as N-1, and now
 we are slightly more likely to have a failing mon because we have 1 more
 that might fail.

On balance it's not clear that have one fewer mon is any better, so avoid
the confusion and complexity of second-guessing ourselves.

Signed-off-by: Sage Weil <sage@newdream.net>
Cannot be combined with 'count'.

Signed-off-by: Sage Weil <sage@newdream.net>
If we are told to deploy multiple instances on a host, do it.

Signed-off-by: Sage Weil <sage@newdream.net>
In the no-count cases, our job is simple: we have a set of hosts specified
via some other means (label, filter, explicit list) and simply need to
do N instances for each of those hosts.

Signed-off-by: Sage Weil <sage@newdream.net>
…xplicit placement

At least for now we don't have a reasonable way to stamp out N placements
when we're also constraining the name/network/ip--not in the general case.

Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
- simpler
- many/most callers already have the daemon list, so we save ourselves
duplicated effort

Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
Otherwise we may end out randomly doubling up on some hosts and none on
others (when we have more than one placement per host).

Signed-off-by: Sage Weil <sage@newdream.net>
For certain daemon types, we can deploy more than one per host (mds,
rbd-mirror, rgw).

Signed-off-by: Sage Weil <sage@newdream.net>
We already have to examine existing daemons to choose the placements.
There is no reason to make the caller call another method to (re)calculate
what the net additions and removals are.

Signed-off-by: Sage Weil <sage@newdream.net>
It no longer does anything except slice the array.

Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
… host_pattern

I think this is better for the same reason we made PlacementSpec() not
mean 'all hosts' by default.  If you really want N daemons for every host
in the cluster, be specific with 'count-per-host:2 *'.

Signed-off-by: Sage Weil <sage@newdream.net>
- Rename get_host_selection_size() to get_target_size() since the host
  part of the name was a bit misleading
- Take count-per-host into consideration.

Signed-off-by: Sage Weil <sage@newdream.net>
@sebastian-philipp
Copy link
Contributor

sebastian-philipp commented Mar 11, 2021

I think this is the cause of

https://pulpito.ceph.com/sage-2021-03-11_01:59:56-rados:cephadm-wip-sage-testing-2021-03-10-1754-distro-basic-smithi/5954929/

Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/serve.py", line 461, in _apply_all_services
    if self._apply_service(spec):
  File "/usr/share/ceph/mgr/cephadm/serve.py", line 595, in _apply_service
    forcename=slot.name)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 548, in get_unique_name
    f'name {daemon_type}.{forcename} already in use')
orchestrator._interface.OrchestratorValidationError: name mgr.x already in use

@tchaikov
Copy link
Contributor

tchaikov commented Mar 12, 2021

2021-03-12T06:06:20.405 INFO:journalctl@ceph.mon.a.smithi012.stdout:Mar 12 06:06:19 smithi012 conmon[23630]: audit
2021-03-12T06:06:20.405 INFO:journalctl@ceph.mon.a.smithi012.stdout:Mar 12 06:06:19 smithi012 conmon[23630]: 2021-03-12T06:06:
2021-03-12T06:06:20.405 INFO:journalctl@ceph.mon.a.smithi012.stdout:Mar 12 06:06:19 smithi012 conmon[23630]: 19.737138+0000 mon.a (mon.
2021-03-12T06:06:20.406 INFO:journalctl@ceph.mon.a.smithi012.stdout:Mar 12 06:06:19 smithi012 conmon[23630]: 0) 100 : audit [INF]
2021-03-12T06:06:20.406 INFO:journalctl@ceph.mon.a.smithi012.stdout:Mar 12 06:06:19 smithi012 conmon[23630]: from='mgr.14144 172.21.15.12:0/2372735735' entity='mgr.a' cmd=[{"prefix":"config rm","who":"mgr","name":"mgr/rbd_support/a/trash_purge_schedule"}]: dispatch
2021-03-12T06:06:21.404 INFO:journalctl@ceph.mon.a.smithi012.stdout:Mar 12 06:06:20 smithi012 conmon[23630]: cluster 2021-03-12T06:06:19.575845+0000 mgr.a (mgr.14144) 30
2021-03-12T06:06:21.405 INFO:journalctl@ceph.mon.a.smithi012.stdout:Mar 12 06:06:20 smithi012 conmon[23630]:  : cluster [DBG] pgmap v23: 0 pgs: ; 0 B data, 0 B used, 0 B / 0 B avail
2021-03-12T06:06:22.404 INFO:journalctl@ceph.mon.a.smithi012.stdout:Mar 12 06:06:21 smithi012 conmon[23630]: cluster 2021-03-12T06:06:21.576319+0000 mgr.a (mgr.14144
2021-03-12T06:06:22.405 INFO:journalctl@ceph.mon.a.smithi012.stdout:Mar 12 06:06:21 smithi012 conmon[23630]: ) 31 : cluster [DBG] pgmap v24: 0 pgs: ; 0 B data, 0 B used, 0 B / 0 B avail
2021-03-12T06:06:24.904 INFO:journalctl@ceph.mon.a.smithi012.stdout:Mar 12 06:06:24 smithi012 conmon[23630]: cluster 2021-03-12T06:06:23.576707+0000 mgr.a (mgr.14144) 32
2021-03-12T06:06:24.905 INFO:journalctl@ceph.mon.a.smithi012.stdout:Mar 12 06:06:24 smithi012 conmon[23630]:  : cluster [DBG] pgmap v25: 0 pgs: ; 0 B data, 0 B used, 0 B / 0 B avail
2021-03-12T06:06:27.404 INFO:journalctl@ceph.mon.a.smithi012.stdout:Mar 12 06:06:26 smithi012 conmon[23630]: cluster 2021-03-12T06:06:
2021-03-12T06:06:27.405 INFO:journalctl@ceph.mon.a.smithi012.stdout:Mar 12 06:06:26 smithi012 conmon[23630]: 25.577088+0000 mgr.a (mgr.14144) 33 : cluster [DBG] pgmap v26: 0 pgs: ; 0 B data, 0 B used, 0 B / 0 B avail
2021-03-12T06:06:28.749 INFO:teuthology.orchestra.run.smithi175.stderr:Error: error looking up supplemental groups for container 52cb72e8c6d8c41774a697dc79fd491ae5e7dea82a67a32b1eed1fbe6912cfb5: Unable to find group disk
2021-03-12T06:06:28.779 DEBUG:teuthology.orchestra.run:got remote process result: 126

in https://pulpito.ceph.com/kchai-2021-03-12_05:42:44-rados-wip-kefu-testing-2021-03-12-1106-distro-basic-smithi/5958482/

not sure if it's related.

i also ran into the failure of orchestrator._interface.OrchestratorValidationError: name mgr.a already in use in the 2nd run above. see https://pulpito.ceph.com/kchai-2021-03-11_16:25:48-rados-wip-kefu-testing-2021-03-11-2158-distro-basic-gibba/5956186/ . but failed to reproduce it in the 3rd run.

@liewegas liewegas merged commit 91b741b into ceph:master Mar 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants