Skip to content

(5/5) [nexus] Implement Affinity/Anti-Affinity Groups in external API #7447

Merged
smklein merged 82 commits into
mainfrom
affinity-integration
Feb 25, 2025
Merged

(5/5) [nexus] Implement Affinity/Anti-Affinity Groups in external API #7447
smklein merged 82 commits into
mainfrom
affinity-integration

Conversation

@smklein

@smklein smklein commented Jan 30, 2025

Copy link
Copy Markdown
Collaborator

Pulled out of #7076

This PR is a partial implementation of RFD 522

It adds:

  • Affinity and Anti-Affinity groups, contained within projects. These groups are configured with a policy and failure domain can currently contain zero or more members. Affinity groups attempt to co-locate members, anti-affinity groups attempt to avoid co-locating members.
    • Policy describes "what to do if we cannot fulfill the co-location request". Currently, these options are "fail" (reject the request) or "allow" (continue with provisioning of the group member regardless).
    • Failure Domain describes the scope of what is considered "co-located". In this PR, the only option is "sled", but in the future, this may be expanded to e.g. "rack".
    • Members describe what can be added to affinity/anti-affinity groups. In this PR, the only option is "instance". RFD 522 describes how "anti-affinity groups may also contain affinity groups" -- which is why this "member" terminology is introduced -- but it is not yet implemented.
  • (anti-)Affinity groups are exposed by the API, through a CRUD interface
  • (anti-)Affinity groups are considered during "sled reservation", where instances are placed on a sled. This is most significantly implemented (and tested) within nexus/db-queries/src/db/datastore/sled.rs, within (4/5) [nexus] Consider Affinity/Anti-Affinity Groups during instance placement #7446

Fixes #1705

Comment thread nexus/test-utils/src/resource_helpers.rs Outdated
@benjaminleonard

Copy link
Copy Markdown
Contributor

Perhaps it would be useful to surface whether a member is presently satisfying an affinity request. I think I'd be interested to click into an affinity group and see its current affinity status. Or return a list of members that are currently failing to satisfy an affinity request.

Then, the next question might be; how do I fix it? Which I presume in most cases is, stop and start the instance ... which the user can do. And occasionally, reduce overall utilization / wait for software update to finish / add more capacity – which the user might not be privy to.

Base automatically changed from affinity-instance-integration to main February 24, 2025 23:07
@smklein

smklein commented Feb 25, 2025

Copy link
Copy Markdown
Collaborator Author

Perhaps it would be useful to surface whether a member is presently satisfying an affinity request. I think I'd be interested to click into an affinity group and see its current affinity status. Or return a list of members that are currently failing to satisfy an affinity request.

I filed #7614 to track this. I think it's a totally reasonable request.

Then, the next question might be; how do I fix it? Which I presume in most cases is, stop and start the instance ... which the user can do. And occasionally, reduce overall utilization / wait for software update to finish / add more capacity – which the user might not be privy to.

This is more subtle - we could also presumably automatically resolve this in some cases, by live-migrating, but doing so feels a little opinionated. This may justify an additional policy for affinity groups, beyond the "policy = allow" that we currently have -- maybe we want "policy = allow, but if we can't fulfill it, keep it where it is" vs "policy = allow, and if we can't fulfill it now, move it later".

@smklein smklein merged commit d5e8052 into main Feb 25, 2025
@smklein smklein deleted the affinity-integration branch February 25, 2025 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Want ability to have anti-affinity

4 participants