mgr/cephadm: set OSD flags during upgrade#50508
Conversation
| self.log.info( | ||
| f"Unset global osd flag {flag}") | ||
|
|
||
| def get_set_global_osd_flags(self) -> List[str]: |
There was a problem hiding this comment.
I find the name of this function confusing, it sort of sounds like it can both get and set flags. I'd go for something more like get_current_global_osd_flags.
| Option( | ||
| 'upgrade_osd_flags', | ||
| type='str', | ||
| default='noout,nodeep-scrub,noscrub', |
There was a problem hiding this comment.
my 2 cents: Is it actually a good idea to set these? I mean as long as the upgrade is reasonable fast, everything will be fine. I'm just worried about the degraded reliability of the whole cluster if the upgrade takes longer than usual. Why do we set noout for all osds, even if we don't touch them for e.g. a month if the upgrade is stuck somewhere?
|
Reruns of failed/dead jobs: https://pulpito.ceph.com/adking-2023-04-04_05:19:12-orch:cephadm-wip-adk-testing-2023-04-03-1541-distro-default-smithi/ After reruns, 6 failures. All 6 failures were on tests being added/adjusted by PRs in the run (2 keepalive-only nfs tests, 2 mon set crush location tests, 2 staggered upgrade tests) and those failures don't seem like they should block merging for any PRs in the run outside of those adjusting the tests. |
NOTE: This PR is one of the ones adjusting tests, specifically the staggered upgrade. Will need to keep working on the test changes before merging this. |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
6d915ee to
1a0dd74
Compare
in this commit at least, this means the noout, noscrub, and nodeep-scrub flags, although it's configurable with the mgr/cephadm/osd_upgrade_flags conf option. The idea is that during upgrade there may be situations that cause OSDs to get marked down and out (see the linked tracker) and so it's good practice to have these set during upgrade. This also gets us in line with ceph-ansible, which set these flags itself during upgrade. Using the staggered upgrade test to test this since we need to be upgrading from a mgr with upgrade code that contains these changes. Fixes: https://tracker.ceph.com/issues/56670 Signed-off-by: Adam King <adking@redhat.com>
"ceph versions" has stopped being a reliable way of checking if daemons have been upgraded (see linked tracker). The idea of this commit is to instead rely on the container image id that cephadm tracks for each daemon, since we have more control over that. Fixes: https://tracker.ceph.com/issues/58535 Signed-off-by: Adam King <adking@redhat.com>
1a0dd74 to
db27d9c
Compare
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution! |
Additionally, the staggered upgrade test is currently broken, so included an additional commit to attempt to fix that. Would recommend looking at the two commits separately when reviewing, especially since they both modify the staggered upgrade teuthology test
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows