mgr/cephadm: add iscsi and nfs to upgrade process#39677
mgr/cephadm: add iscsi and nfs to upgrade process#39677liewegas merged 2 commits intoceph:masterfrom
Conversation
1317837 to
8d227a2
Compare
src/pybind/mgr/cephadm/module.py
Outdated
| # need iscsi and nfs as well in order to upgrade them | ||
| if daemon_type not in CEPH_TYPES and daemon_type not in ['nfs', 'iscsi']: |
There was a problem hiding this comment.
this hunk is still needed?
There was a problem hiding this comment.
What do you mean by this comment? Do you think we can just remove the check and allow custom images for any daemon type?
|
please veirfy this in teutology. Please add either iscsi or nfs (or both) to https://github.com/ceph/ceph/blob/master/qa/suites/rados/cephadm/upgrade/fixed-2.yaml See and ceph/qa/suites/rados/cephadm/workunits/task/test_orch_cli.yaml Lines 14 to 17 in a171b32 |
8d227a2 to
a8c1220
Compare
7298018 to
9f26ac7
Compare
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
9f26ac7 to
f519ef5
Compare
|
There was an issue here with the way the new ok-to-stop functions worked with upgrade. We don't pause upgrade if an ok-to-stop check fails or notify the user so in a case where you have something like only have 1 nfs daemon, the ok-to-stop would perpetually fail as the check is on a static condition that requires change from the user, and the upgrade would go on forever without telling the user anything is wrong. The new ok-to-stop functions make sense for something like putting a host in maintenance mode but simply don't work well with the upgrade. I opted for limiting the ok-to-stop checks in upgrade to the daemons who had a defined ok-to-stop before the addition of host maintenance (mon, osd and mds, the ones with an actual ceph ok-to-stop command) to avoid the situation. If we don't want to do that, some change to the ok-to-stop functions would have to happen such as having a stronger force flag or making them aware of when an upgrade is happening. Even without this PR adding isci and nfs to upgrade I think the issue might already exist if there is only a single rgw daemon during an upgrade. |
|
@sebastian-philipp I added iscsi directly to the fixed-2.yaml file. How would adding nfs work? I'm assuming I can't just copy the cephfs_test_runner block from the example you linked into the fixed-2.yaml. |
f519ef5 to
704cffd
Compare
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
If the caps change from the old version to the new one it causes issues in the upgrade. This allows the caps to be updated. Currently only seeing this with iscsi but changing it for other as a precaution Signed-off-by: Adam King <adking@redhat.com>
Fixes: https://tracker.ceph.com/issues/49462 Signed-off-by: Adam King <adking@redhat.com>
704cffd to
20e7b4d
Compare
Fixes: https://tracker.ceph.com/issues/54502 Signed-off-by: Tatjana Dehler <tdehler@suse.com> (cherry picked from commit e233ed0) Conflicts: src/pybind/mgr/cephadm/services/cephadmservice.py Fixed conflict because `get_keyring_with_caps` has not been backported to octopus: ceph#39677 src/pybind/mgr/cephadm/services/monitoring.py Fixed a few conflicts because the upstream master contains some improvements around handling URLs, e.g. ceph#43579
Fixes: https://tracker.ceph.com/issues/54502 Signed-off-by: Tatjana Dehler <tdehler@suse.com> (cherry picked from commit 4f14993) Conflicts: src/pybind/mgr/cephadm/services/cephadmservice.py Fixed conflict because `get_keyring_with_caps` has not been backported to octopus: ceph#39677 src/pybind/mgr/cephadm/services/monitoring.py Fixed a few conflicts because the master contains some improvements around handling URLs, e.g. ceph#43579
Fixes: https://tracker.ceph.com/issues/49462
Signed-off-by: Adam King adking@redhat.com
The reason for the change to the way the keyring is grabbed is because the iscsi caps changed at some point and upgrading from an old version might mean there's an already existent keyring with different caps. The
auth get-or-createcommand fails if you pass an existent entity with different caps and the iscsi daemon could not be redeployed (in which case it cannot be upgraded either).The reason for avoiding the reconfig due to monmap changes in the middle of the upgrade is it made it possible for similar problems to https://tracker.ceph.com/issues/49013 where the daemon being reconfigured had an old unit.run file that was incompatible with changes in the updated systemd unit file. In that case, specifically for nfs, trying to reconfig the daemon would fail (reconfig does not redeploy the unit.run file for the daemon) and you would have to wait until the call timed out. This made the upgrade significantly slower since it would happen every time the serve loop was entered from when the mons were upgraded until nfs was upgraded.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume tox