Skip to content

Planner/builder split: Move image source decisions to the planner#8472

Merged
jgallagher merged 8 commits into
mainfrom
john/planner-zone-image-source
Jul 3, 2025
Merged

Planner/builder split: Move image source decisions to the planner#8472
jgallagher merged 8 commits into
mainfrom
john/planner-zone-image-source

Conversation

@jgallagher

Copy link
Copy Markdown
Contributor

This is a draft because I haven't updated any of the tests yet; I wanted to get some a yay/nay on whether we like this direction before doing that work.

Builds on #8470.

The big changes here are:

  • BlueprintBuilder no longer decides on image sources for zones. All the sled_add_zone_* functions now take an extra image_source argument.
  • is_zone_ready_for_update() is now a method in the planner, not the builder. This is used both when deciding whether a zone can be updated and also when choosing the image source for new zones to be added.
  • can_zone_be_shut_down_safely() is now a separate method in the planner. This is only used when deciding whether a zone can be updated.
  • zone_image_source is now a method on TargetReleaseDescription. It returns an error if we have a TUF repo that doesn't have exactly one matching artifact for a given ZoneKind. In the planner, this bubbles out a few ways:
    • we'll fail planning if we try to add a zone that isn't in the target release TUF repo
    • we'll refuse to update a zone if it isn't in the target release TUF repo (but this won't fail planning)
    • we'll refuse to update Nexus (because it won't be able to tell whether whatever zone type is missing has itself been updated)

Comment on lines +1116 to +1119
if !self.can_zone_be_shut_down_safely(kind) {
return false;
}
match self.is_zone_ready_for_update(kind) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's very likely we'll want to propagate "reasons" out from here, beyond just logging.

E.g., with cockroach, if we say "You cannot update because you have underreplicated ranges", that is a planner-decision that's probably worth documenting. Otherwise, the planner will just say "can't update now", which isn't that great - not even really identifying which zone needs work, or why.

TL;DR: We probably want a stronger return type than a boolean here, for both of these conditions? something like

enum ShutdownSafety {
  CanShutdown,
  CannotShutdown {
    reason: String,
  } 
}

enum UpdateReadiness {
  Ready,
  NotReady {
    reason: String,
  } 
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, definitely. I'm inclined to say that should come in with whatever we do to fix #8284? Anything I do here will certainly have to change for whatever the full reporting solution is, so I just did the simplest thing for now.

Comment thread nexus/reconfigurator/planning/src/planner.rs Outdated
let mut updateable_zones =
out_of_date_zones.filter(|(_sled_id, zone, _new_image_source)| {
let kind = zone.zone_type.kind();
if !self.can_zone_be_shut_down_safely(kind) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This separation makes sense to me. I was a little worried with the old format, where adding the check for "live_nodes == 5" because that could prevent us from adding a CRDB node when "live_nodes == 4" (which would be bad).

But this structure should let us still make that check, and we only need to validate it in the "update-in-place" case, not necessarily in the "add-new-zone" case.

@jgallagher jgallagher force-pushed the john/planner-zone-image-source branch from c5499e2 to 57bb99d Compare June 30, 2025 17:38
Base automatically changed from john/planning-input-fewer-options to main June 30, 2025 19:19
* split `is_zone_ready_for_update` into that and `can_zone_be_shut_down_safely`
* move both methods to the planner
* move zone image source method to `TargetReleaseDescription`
* log errors if a TUF repo is missing an artifact for a known zone kind
@jgallagher jgallagher force-pushed the john/planner-zone-image-source branch from 57bb99d to ff3507a Compare June 30, 2025 20:23
@jgallagher jgallagher marked this pull request as ready for review July 1, 2025 14:01
// attached.
let target_release_generation = Generation::from_u32(2);

// Manually specify a trivial TUF repo.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this part of the test because we now get planner errors if we try to plan an update with a TUF repo that contains no artifacts. Specifically: it will check whether Nexus can be updated, and fail because it can't determine whether any other zones are updated (because the TUF repo has no artifacts for them).

) -> anyhow::Result<BlueprintZoneImageSource> {
if let Some(image_source) = image_source {
Ok(image_source.into())
} else if planning_input.tuf_repo() == planning_input.old_repo() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add a comment here explaining what this is doing? I think I understand (this follows up from your change about making the base case for old_repo an unspecified target release) but an explanation would be useful here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 56a5149; feels kinda meandering though. Is it clear enough?

Comment thread dev-tools/reconfigurator-cli/src/lib.rs Outdated
.zone_image_source(zone_kind)
.context("could not determine image source")
} else {
bail!("must specify image source for new zone")

@sunshowers sunshowers Jul 1, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to return the allowed set of image sources here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Done in 56a5149. It's kinda wordy, but nice to be able to copy and paste:

〉blueprint-edit 5ea6b546-a9e7-4cef-b7aa-b1aafa52c7c4 add-nexus serial0
error: must specify image source for new zone; options: `install-dataset`, `artifact 1.0.0 0e32b4a3e5d3668bb1d6a16fb06b74dc60b973fa479dcee0aae3adbb52bf1388` (from current TUF repo)
〉blueprint-edit 5ea6b546-a9e7-4cef-b7aa-b1aafa52c7c4 add-nexus serial0 install-dataset
blueprint 9b922100-71fb-45a0-9d2c-d3f6301e26b6 created from blueprint 5ea6b546-a9e7-4cef-b7aa-b1aafa52c7c4: added Nexus zone to sled c8f312d9-181e-47ca-8f2c-acda3dc0d584
〉blueprint-edit 5ea6b546-a9e7-4cef-b7aa-b1aafa52c7c4 add-nexus serial0 artifact 1.0.0 0e32b4a3e5d3668bb1d6a16fb06b74dc60b973fa479dcee0aae3adbb52bf1388
blueprint e9d83e96-0fc8-4b12-b5c7-9153cb71e7cc created from blueprint 5ea6b546-a9e7-4cef-b7aa-b1aafa52c7c4: added Nexus zone to sled c8f312d9-181e-47ca-8f2c-acda3dc0d584

@jgallagher jgallagher merged commit d4cf01a into main Jul 3, 2025
16 checks passed
@jgallagher jgallagher deleted the john/planner-zone-image-source branch July 3, 2025 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants