mgr/cephadm: redeploy daemons deployed using old image after upgrade#39435
mgr/cephadm: redeploy daemons deployed using old image after upgrade#39435liewegas merged 2 commits intoceph:masterfrom
Conversation
sebastian-philipp
left a comment
There was a problem hiding this comment.
What I don't like here is: there is no persistent state to track those upgrades, this makes reasoning about this code somewhat complicated in case there is an unexpected failover while we're within _post_upgrade_redeploys
One idea would be to do something like
for d in daemons:
if d.daemon_type == 'mgr':
self.mgr.cache.schedule_daemon_action(d.hostname, d.name(), 'redeploy')But this would have the downside of doing upgrade steps after the upgrade has already completed. I don't like this either.
🤷♂️
src/pybind/mgr/cephadm/upgrade.py
Outdated
| assert d.daemon_type is not None | ||
| assert d.daemon_id is not None | ||
| assert d.hostname is not None |
There was a problem hiding this comment.
I really need to fix the types. having those members being optional is irritating.
a5e89a2 to
4feea4e
Compare
|
Now overhauled to construct a list of the daemons that need to be redeployed as part of the upgrade state. As each daemon is redeployed it is removed from the list and the upgrade state is saved again so we don't have to redeploy those daemons again in case of a fail over. |
|
jenkins test make check |
|
if I remember correctly, @liewegas had the idea of using the meta data stored along the daemons to know the ceph version that deployed the daemons, right? |
Yes, I think the idea was to store what mgr version deployed the daemon in the unit.meta file from liewegas@9b9cc97 and then that could be included as part of the DaemonDesciption. Although, I don't think that change adding the unit.meta file exists in master. |
|
see https://github.com/ceph/ceph/pull/39550. @sebastian-philipp i wonder if we should merge this part without the autotuning bits? |
I tried to take a "stateless" approach to _do_upgrade so that it will always reconcile the current cluster state with our desired version, avoiding any complexity from intervening mgr restarts, canceled/restarted upgrades, restarted upgrades to different versions, and so on... |
|
#39644 is merged |
4feea4e to
d89037f
Compare
|
This now makes use of the unit.meta file. The container digest of the mgr used to deploy the daemon is stored in the unit.meta file and then, after the upgrade process, all daemons who were not deployed by a mgr using the new container are redeployed. |
d89037f to
4cb3578
Compare
|
Lots of errors:
I'm a little hesitant to merge this with the broken upgrade suite. |
|
hold on |
|
rerun: with --ceph wip-swagner-testing-2021-03-11-1320 --suite-repo http://github.com/sebastian-philipp/ceph.git --suite-branch wip-swagner-testing-2021-03-11-1320-rm-iscsi |
4cb3578 to
6e79690
Compare
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
6e79690 to
0fa7004
Compare
| 'service_name': daemon_spec.service_name, | ||
| 'ports': daemon_spec.ports, | ||
| 'ip': daemon_spec.ip, | ||
| 'deployed_by': self.mgr.get_active_mgr_digests(), |
There was a problem hiding this comment.
I wonder if instead of digests we should just the ceph version string, e.g. 16.1.0-917-g08ab4ae8. it's easier to get that the digests (we can drop get_active_mgr_digets() above) and it is also something a human can make sense of.
There was a problem hiding this comment.
I like the idea 👍. I'll try it out and if there's no issues I'll swap over to that.
0fa7004 to
7ab9749
Compare
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
…t.meta For use in upgrade. It's useful to see if daemon was deployed by mgr running same container digest as container we are upgrading to. Signed-off-by: Adam King <adking@redhat.com>
| @@ -23,7 +23,8 @@ class CephadmNoImage(Enum): | |||
| # NOTE: order important here as these are used for upgrade order | |||
| CEPH_TYPES = ['mgr', 'mon', 'crash', 'osd', 'mds', 'rgw', 'rbd-mirror', 'cephfs-mirror'] | |||
| GATEWAY_TYPES = ['iscsi', 'nfs'] | |||
There was a problem hiding this comment.
Perhaps in a separate patch, do we always want to move iscsi and nfs back into the CEPH_TYPES? So that their ceph.confs are updated?
There was a problem hiding this comment.
I think that can be done at the same time we fix the reconfigs during upgrade. That's why they got their own group to begin with.
Add extra check that daemons were deployed by mgr using new image during upgrade. Makes sure unit.run file for all daemons are updated if they changed between old and new images. Fixes: https://tracker.ceph.com/issues/49013 Signed-off-by: Adam King <adking@redhat.com>
7ab9749 to
83f0ca9
Compare
daemons not upgraded during the upgrade call and mgr daemons that could
have been upgraded by a mgr running the old pre-upgrade image need to be
redeployed after the rest of the upgrade is complete in case something
important has changed in the deployment between the old image and new
image
Fixes: https://tracker.ceph.com/issues/49013
Signed-off-by: Adam King adking@redhat.com
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume tox