[sled-agent] Allow PUT /omicron-zones to accept filesystem_pool patches#7449
Conversation
…ches This is the first half of #7229. The second half is to teach Reconfigurator to actually fill in the missing filesystem_pool values; that will be upcoming shortly.
| } | ||
|
|
||
| #[test] | ||
| fn test_fix_7229_zone_config_reconciliation() { |
There was a problem hiding this comment.
Thanks for all the tests, this looks good
| // | ||
| // then we don't want to shut the zone down and restart it, because the | ||
| // config hasn't actually changed in any meaningful way; this is just | ||
| // reconfigurator correcting #7229. |
There was a problem hiding this comment.
More just out of curiosity than anything: Are there other situations where this is true? Why is #7229 special?
There was a problem hiding this comment.
I don't think there are any other such situations today, but there could be in the future.
Stepping back for context, if we get to this point, we have a situation where:
- sled-agent has previously been given an
OmicronZoneConfigfor a particular zone; in principle, this defines all the information sled-agent needs to configure and run the zone - sled-agent has now been given an
OmicronZoneConfigfor the same zone that differs in some way
Without more information, sled-agent would need to be able to either reconfigure the zone on the fly to adjust for any changes in that config, or (for simplicity) tear the zone down and start it up again with the new configuration. For example, imagine we changed the upstream NTP servers; today, sled-agent has no way of reconfiguring a boundary NTP zone, but we could always shut the zone down and then restart it with the new config.
#7229 is special because changing the config in a way that is consistent with the already-running zone; it's just information that was missing when we started it. If the old OmicronZoneConfig had filesystem_pool: None, sled-agent had to choose some filesystem_pool on its own. The fix for #7229 means the new configs coming in now fill in filesystem_pool: Some(pool_sled_agent_chose). The zone configs are different, but not in a way that necessitates reconfiguring or restarting the zone.
I imagine we'll have future cases similar to this if/when we expand OmicronZoneConfig (e.g., fixing #7214 will add dataset IDs to zone configs in the blueprint; if that makes its way down to OmicronZoneConfig, we'll presumably want to do something similar where if we're just filling in the dataset that the zone is already using, we don't need to reconfigure or restart it).
There was a problem hiding this comment.
Thanks for the context. What I was thinking about reading this comment (from the perspective of a future reader) was how I would know what other cases might be special besides 7229. I think the last sentence sort of gets at the more general "why"; maybe that could be moved up into a summary toward the top of the comment, but given that discussion is captured in the PR that's probably fine for now. Your call!
This builds on #7449 and, in combination with it, fixes #7229. When planning a new blueprint, one of the first things we now do is loop over all in-service zones on all active sleds and check that their `filesystem_pool` is consistent with the latest inventory collection from that sled. The expectation here is that we'll be filling in `filesystem_pool: None` values from older systems (i.e., #7229). There's also an edge case where we might need to correct our own incorrect backfills, if a sled-agent has restarted since the inventory collection was taken and has chosen a different zpool for some zone. If we can't find what the correct filesystem pool should be (e.g., if the sled isn't present in inventory at all, or none of its zpool/dataset combinations match the zone), we leave the `filesystem_pool` value untouched. I'll put testing notes from a4x2 in a comment below.
This is the first half of #7229. The second half is to teach Reconfigurator to actually fill in the missing filesystem_pool values; that will be upcoming shortly, and I'll put testing notes in it.