Skip to content

Blueprint planner is overly conservative with respect to NTP zones and timesync #8353

@jgallagher

Description

@jgallagher

The planner is very careful to avoid doing anything with a sled on which it has just placed an NTP zone, which is well-commented inside of do_plan_add():

// After we make our initial pass through the sleds below to check for
// zones every sled should have (NTP, Crucible), we'll start making
// decisions about placing other service zones. We need to _exclude_ any
// sleds for which we just added an NTP zone, as we won't be able to add
// additional services to them until that NTP zone has been brought up.

// Check for an NTP zone. Every sled should have one. If it's not
// there, all we can do is provision that one zone. We have to wait
// for that to succeed and synchronize the clock before we can
// provision anything else.

// Now we've established that the current blueprint _says_ there's
// an NTP zone on this system. But we must wait for it to actually
// be there before we can proceed to add anything else. Otherwise,
// we may wind up trying to provision this zone at the same time as
// other zones, and Sled Agent will reject requests to provision
// other zones before the clock is synchronized.

The final comment gets at the driving reason: avoiding sled-agent rejecting requests to provision zones because time isn't synchronized yet.

With zone startup now going through the config reconciler, sled-agent no longer behaves this way: it's fine to send a sled config that includes both a new NTP zone and a set of zones that depend on time being synchronized. sled-agent will start the NTP zone, and wait to start any zones that depend on timesync until after that new NTP zone is sync'd with its upstream. We could probably simplify do_plan_add() a fair bit based on this, although I think we would have to make some explicit policy decisions that today are implicit in "do nothing until time is sync'd" (e.g., are we willing to place discretionary services on a sled before it syncs time?).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Task.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions