Skip to content

mgr/cephadm: upgrade multiple OSDs in parallel#39726

Merged
liewegas merged 7 commits intoceph:masterfrom
liewegas:cephadm-parallel-osd-upgrade
Mar 4, 2021
Merged

mgr/cephadm: upgrade multiple OSDs in parallel#39726
liewegas merged 7 commits intoceph:masterfrom
liewegas:cephadm-parallel-osd-upgrade

Conversation

@liewegas
Copy link
Member

Make use of the new ok-to-stop structured output and --max argument to expand the list of OSDs to restart, speeding up upgrades.

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@liewegas
Copy link
Member Author

jenkins test make check

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@liewegas liewegas force-pushed the cephadm-parallel-osd-upgrade branch from c1b31f6 to 3a431b4 Compare March 1, 2021 15:17
j = json.loads(r.stdout)
except json.decoder.JSONDecodeError:
self.mgr.log.warning("osd ok-to-stop didn't return structur result")
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raise

I'd be 👍 for seeing tracker issues, if ok-to-stop returns something invalie.

break

num = 1
for d in to_upgrade:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need use @cephadm.utils.forall_hosts in order to make this in parallel.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure it's necessary to parallelize this tbh.. leaving that as a future optimization

liewegas added 6 commits March 2, 2021 13:56
Signed-off-by: Sage Weil <sage@newdream.net>
Optionally provide a list of previously known-to-be-ok-to-stop items to
the ok_to_stop method. This has to get plumbed through a zillion instances
of this class method.

No functional change (yet).

Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
Restart multiple osds in a single upgrade pass, when possible.

Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
This message is shown during the upgrade process.

Signed-off-by: Sage Weil <sage@newdream.net>
@liewegas liewegas force-pushed the cephadm-parallel-osd-upgrade branch from 3a431b4 to fb778a3 Compare March 2, 2021 19:07
to_upgrade.append(d)
continue

if not self._wait_for_ok_to_stop(d, known_ok_to_stop):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

took me a while to figure out that _wait_for_ok_to_stop changes known_ok_to_stop in this loop

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some comments!

Signed-off-by: Sage Weil <sage@newdream.net>
@liewegas liewegas merged commit c7e1d20 into ceph:master Mar 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants