The planner is very careful to avoid doing anything with a sled on which it has just placed an NTP zone, which is well-commented inside of do_plan_add():
|
// After we make our initial pass through the sleds below to check for |
|
// zones every sled should have (NTP, Crucible), we'll start making |
|
// decisions about placing other service zones. We need to _exclude_ any |
|
// sleds for which we just added an NTP zone, as we won't be able to add |
|
// additional services to them until that NTP zone has been brought up. |
|
// Check for an NTP zone. Every sled should have one. If it's not |
|
// there, all we can do is provision that one zone. We have to wait |
|
// for that to succeed and synchronize the clock before we can |
|
// provision anything else. |
|
// Now we've established that the current blueprint _says_ there's |
|
// an NTP zone on this system. But we must wait for it to actually |
|
// be there before we can proceed to add anything else. Otherwise, |
|
// we may wind up trying to provision this zone at the same time as |
|
// other zones, and Sled Agent will reject requests to provision |
|
// other zones before the clock is synchronized. |
The final comment gets at the driving reason: avoiding sled-agent rejecting requests to provision zones because time isn't synchronized yet.
With zone startup now going through the config reconciler, sled-agent no longer behaves this way: it's fine to send a sled config that includes both a new NTP zone and a set of zones that depend on time being synchronized. sled-agent will start the NTP zone, and wait to start any zones that depend on timesync until after that new NTP zone is sync'd with its upstream. We could probably simplify do_plan_add() a fair bit based on this, although I think we would have to make some explicit policy decisions that today are implicit in "do nothing until time is sync'd" (e.g., are we willing to place discretionary services on a sled before it syncs time?).
The planner is very careful to avoid doing anything with a sled on which it has just placed an NTP zone, which is well-commented inside of
do_plan_add():omicron/nexus/reconfigurator/planning/src/planner.rs
Lines 501 to 505 in fe9bc0d
omicron/nexus/reconfigurator/planning/src/planner.rs
Lines 545 to 548 in fe9bc0d
omicron/nexus/reconfigurator/planning/src/planner.rs
Lines 594 to 599 in fe9bc0d
The final comment gets at the driving reason: avoiding sled-agent rejecting requests to provision zones because time isn't synchronized yet.
With zone startup now going through the config reconciler, sled-agent no longer behaves this way: it's fine to send a sled config that includes both a new NTP zone and a set of zones that depend on time being synchronized. sled-agent will start the NTP zone, and wait to start any zones that depend on timesync until after that new NTP zone is sync'd with its upstream. We could probably simplify
do_plan_add()a fair bit based on this, although I think we would have to make some explicit policy decisions that today are implicit in "do nothing until time is sync'd" (e.g., are we willing to place discretionary services on a sled before it syncs time?).