Skip to content

Improve "services PUT" API: Execute in parallel and allow zone teardown#3676

Merged
smklein merged 2 commits into
mainfrom
service-parallel
Jul 17, 2023
Merged

Improve "services PUT" API: Execute in parallel and allow zone teardown#3676
smklein merged 2 commits into
mainfrom
service-parallel

Conversation

@smklein

@smklein smklein commented Jul 17, 2023

Copy link
Copy Markdown
Collaborator

This should reduce the execution time of both RSS and sled boot a fair bit.

Comparing the moment of "omicron-package install" to "Nexus Handoff Complete", indicating that RSS has fully executed, on a 24 core, 64 GiB workstation running Helios:

Without this change: 9m8.171s
With this change: 3m58.622s

Before this PR

  • The /services PUT API was purely additive. Removing anything from the set of zones to-be-run was unsupported
  • Zone initialization was performed serially.

With this PR

  • The /services PUT API can now both add and remove zones.
  • Zone initialization is concurrent, within a try_for_each_concurrent invocation.

Fixes #726

@iliana iliana left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yay parallelism!

@smklein smklein merged commit e2aa665 into main Jul 17, 2023
@smklein smklein deleted the service-parallel branch July 17, 2023 21:41
smklein added a commit that referenced this pull request Jul 19, 2023
We have seen some behavior where, after failing to initialize services,
the sled agent enters "split-brain" behavior, and on subsequent requests
to initialize services, the sled agent is not aware of zones which it
created itself.

I believe this to be a possible cause: While integrating #3676 , I used
`try_for_each_concurrent` to parallelize zone bringup. This is an
operation which can potentially return early on cancellation, dropping
all other ongoing zone initialization tasks.

This PR mitigates this issue by preventing concurrent drop within
service initialization.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[sled-agent] Parallelize service initialization

2 participants