[reconfigurator] Executor: only do cleanup for zones that are ready_for_cleanup#7713
Conversation
| ) { | ||
| // We expect to only be called with expunged zones that are ready | ||
| // for cleanup; skip any with a different disposition. | ||
| if !config.disposition.is_ready_for_cleanup() { |
There was a problem hiding this comment.
Would this conditional be a bug if it was invoked? Should we at least be logging something here?
There was a problem hiding this comment.
Hmm, I could arguments for any of these options:
- Document this function as "panics if called with a zone that isn't ready for cleanup" and assert.
- Document this function as "returns an error for any zone that isn't ready for cleanup".
- Keep this check and log a warning.
- Keep this check and don't log anything.
I prefer option 4, I think? "Call this method with whatever set of zones you want, and I locally choose to act on the ones I know it's safe to act on". Honestly I'd kind of prefer this take a &Blueprint instead of an iterator of zones to avoid this problem altogether, but that makes the tests more awkward. Maybe this should be split into a pub(crate) version that takes a &Blueprint and a private version that takes this iterator; the &Blueprint one could do the filtering, and tests could call the private one?
There was a problem hiding this comment.
I'm going to take a stab at cleaning this up in a small followup PR.
andrewjstone
left a comment
There was a problem hiding this comment.
Great cleanup. Love to see the progress here!
ready_for_cleanupready_for_cleanup
Builds on #7713, and is followup from #7713 (comment). In #7652 I changed all the executor substeps to take iterators instead of `&BTreeMap` references that no longer existed, but that introduced a weird split where the top-level caller had to filter the blueprint down to just the items that the inner functions expected. @smklein pointed out one place where the inner code was being extra defensive, which was just more confusing than anything. This PR removes that split: the top-level executor now always passes a full `&Blueprint` down, and the inner modules are responsible for doing their own filtering as appropriate. To easy testing, I kept the versions that take an iterator of already-filtered items as private `*_impl` functions that the new functions-that-take-a-full-`Blueprint` themselves call too.
Changes the three steps that required the earlier
PUT /omicron-zonesexecution step to succeed because they needed to clean up after zones that were definitely shut down to now ready theready_for_cleanupblueprint disposition (which is set by the planner after it confirms via inventory that the zones are dead).While I was here, I removed both the special case error-to-warning implementation in a couple of steps and the
finalizestep that compiled those. I believe all of that predates the ability foromdbto interpret the update-engine event report. Now that it can, emitting step warnings directly is fine. I spun this branch up in a4x2 and then shut down one of the sled-agents; that results in this report the executor:Closes #6999, under the assumption that as of this PR we've identified all the inter-step dependencies and broken them.