Blueprint planner is overly conservative with respect to NTP zones and timesync

The planner is very careful to avoid doing anything with a sled on which it has just placed an NTP zone, which is well-commented inside of `do_plan_add()`:

https://github.com/oxidecomputer/omicron/blob/fe9bc0dd8b6192017b20f033b8434b1e76852ba9/nexus/reconfigurator/planning/src/planner.rs#L501-L505

https://github.com/oxidecomputer/omicron/blob/fe9bc0dd8b6192017b20f033b8434b1e76852ba9/nexus/reconfigurator/planning/src/planner.rs#L545-L548

https://github.com/oxidecomputer/omicron/blob/fe9bc0dd8b6192017b20f033b8434b1e76852ba9/nexus/reconfigurator/planning/src/planner.rs#L594-L599

The final comment gets at the driving reason: avoiding sled-agent rejecting requests to provision zones because time isn't synchronized yet.

With zone startup now going through the config reconciler, sled-agent no longer behaves this way: it's fine to send a sled config that includes both a new NTP zone and a set of zones that depend on time being synchronized. sled-agent will start the NTP zone, and wait to start any zones that depend on timesync until after that new NTP zone is sync'd with its upstream. We could probably simplify `do_plan_add()` a fair bit based on this, although I think we would have to make some explicit policy decisions that today are implicit in "do nothing until time is sync'd" (e.g., are we willing to place discretionary services on a sled before it syncs time?).

	// After we make our initial pass through the sleds below to check for
	// zones every sled should have (NTP, Crucible), we'll start making
	// decisions about placing other service zones. We need to _exclude_ any
	// sleds for which we just added an NTP zone, as we won't be able to add
	// additional services to them until that NTP zone has been brought up.

	// Check for an NTP zone. Every sled should have one. If it's not
	// there, all we can do is provision that one zone. We have to wait
	// for that to succeed and synchronize the clock before we can
	// provision anything else.

	// Now we've established that the current blueprint _says_ there's
	// an NTP zone on this system. But we must wait for it to actually
	// be there before we can proceed to add anything else. Otherwise,
	// we may wind up trying to provision this zone at the same time as
	// other zones, and Sled Agent will reject requests to provision
	// other zones before the clock is synchronized.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blueprint planner is overly conservative with respect to NTP zones and timesync #8353

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Blueprint planner is overly conservative with respect to NTP zones and timesync #8353

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions