[nexus] add InstanceState::SagaUnwound#5848
Closed
hawkw wants to merge 6 commits into
Closed
Conversation
This branch is part of ongoing work on the `instance-update` saga (see PR #5749); I've factored it out into a separate PR. This is largely because this branch makes mechanical changes to a bunch of different files that aren't directly related to the core change of #5749, and I'd like to avoid the additional merge conflicts that are likely when these changes remain un-merged for a long time.
We needed to add a filter to the view that prevents v2p mappings for inactive instances from showing up in the v2p_mapping view.
InstanceState::SagaUnwoundInstanceState::SagaUnwound
hawkw
added a commit
that referenced
this pull request
Jun 4, 2024
This branch is part of ongoing work on the `instance-update` saga (see PR #5749); I've factored it out into a separate PR. This is largely because this branch makes mechanical changes to a bunch of different files that aren't directly related to the core change of #5749, and I'd like to avoid the additional merge conflicts that are likely when these changes remain un-merged for a long time. --- Depends on #5854. As part of #5749, it is desirable to distinguish between *why* a VMM record was marked as `Destroyed`, in order to determine which saga is responsible for cleaning up that VMM's resources. The `instance-update` saga must be the only entity that can set an instance's VMM IDs (active and target) to NULL. Presently, however, the `instance-start` and `instance-migrate` sagas may also do this when they unwind. This is a requirement to avoid situations where a failed `instance-update` saga cannot unwind, because the instance's generation number has changed while the saga was executing. We want to ensure that if a start or migrate request fails, the instance will appear to be in the correct post-state as soon as the relevant API call returns. In order to do this, without having to also guarantee that an instance update has occurred, we introduce a new VMM state, `SagaUnwound`, with the following rules: - It is legal to start or migrate an instance if the `active_propolis_id` or `destination_propolis_id` (respectively) is either `NULL` or refers to a VMM that’s in the `SagaUnwound` state (the new VMM ID directly replaces the old ID). - If an instance has an `active_propolis_id` in the `SagaUnwound` state, then the instance appears to be `Stopped`. - If an instance has a `destination_propolis_id` in the `SagaUnwound` state, nothing special happens–the instance’s state is still derived from its active VMM’s state. - The update saga treats `SagaUnwound` VMMs as identical to `Destroyed` VMMs for purposes of deciding whether to remove a VMM ID from an instance. This branch adds a new `VmmState::SagaUnwound` variant. The `SagaUnwound` state is an internal implementation detail that shouldn't be exposed to an operator or in the external API. Sled-agents will never report that a VMM is in this state. Instead, this state is set my the `instance-start` and `instance-migrate` sagas when they unwind. When determining the API instance state from an instance and active VMM query so that the `SagaUnwound` state is mapped to `Destroyed`. Closes #5848, which this replaces.
gjcolombo
added a commit
that referenced
this pull request
Jun 5, 2024
Split the existing `instance_state` enum in the CRDB schema into separate instance state and VMM state enums and remove unused enum variants. Instances now have five states: Creating, NoVmm, Vmm, Failed, and Destroyed. VMMs have most of the states they had before, except that the unused Creating and Repairing states have been removed. This change makes it easier to add new states (e.g. SagaUnwound, see #5848) that apply only to instances/VMMs without having to update code that's working with the other type of DB record. Update the routines the IP attach/detach sagas use to decide where (if anywhere) to dispatch IP change messages. These routines were looking for instance states that weren't actually being used (the states of interest appear in the instances' active VMMs instead). Add a data migration test to make sure that the Stopped and Running instance states are converted as expected. Tests: `cargo nextest`.
hawkw
added a commit
that referenced
this pull request
Jun 5, 2024
This branch is part of ongoing work on the `instance-update` saga (see PR #5749); I've factored it out into a separate PR. This is largely because this branch makes mechanical changes to a bunch of different files that aren't directly related to the core change of #5749, and I'd like to avoid the additional merge conflicts that are likely when these changes remain un-merged for a long time. --- Depends on #5854. As part of #5749, it is desirable to distinguish between *why* a VMM record was marked as `Destroyed`, in order to determine which saga is responsible for cleaning up that VMM's resources. The `instance-update` saga must be the only entity that can set an instance's VMM IDs (active and target) to NULL. Presently, however, the `instance-start` and `instance-migrate` sagas may also do this when they unwind. This is a requirement to avoid situations where a failed `instance-update` saga cannot unwind, because the instance's generation number has changed while the saga was executing. We want to ensure that if a start or migrate request fails, the instance will appear to be in the correct post-state as soon as the relevant API call returns. In order to do this, without having to also guarantee that an instance update has occurred, we introduce a new VMM state, `SagaUnwound`, with the following rules: - It is legal to start or migrate an instance if the `active_propolis_id` or `destination_propolis_id` (respectively) is either `NULL` or refers to a VMM that’s in the `SagaUnwound` state (the new VMM ID directly replaces the old ID). - If an instance has an `active_propolis_id` in the `SagaUnwound` state, then the instance appears to be `Stopped`. - If an instance has a `destination_propolis_id` in the `SagaUnwound` state, nothing special happens–the instance’s state is still derived from its active VMM’s state. - The update saga treats `SagaUnwound` VMMs as identical to `Destroyed` VMMs for purposes of deciding whether to remove a VMM ID from an instance. This branch adds a new `VmmState::SagaUnwound` variant. The `SagaUnwound` state is an internal implementation detail that shouldn't be exposed to an operator or in the external API. Sled-agents will never report that a VMM is in this state. Instead, this state is set my the `instance-start` and `instance-migrate` sagas when they unwind. When determining the API instance state from an instance and active VMM query so that the `SagaUnwound` state is mapped to `Destroyed`. Closes #5848, which this replaces.
hawkw
added a commit
that referenced
this pull request
Jun 5, 2024
This branch is part of ongoing work on the `instance-update` saga (see PR #5749); I've factored it out into a separate PR. This is largely because this branch makes mechanical changes to a bunch of different files that aren't directly related to the core change of #5749, and I'd like to avoid the additional merge conflicts that are likely when these changes remain un-merged for a long time. --- Depends on #5854 and should merge only after that PR. As part of #5749, it is desirable to distinguish between *why* a VMM record was marked as `Destroyed`, in order to determine which saga is responsible for cleaning up that VMM's resources. The `instance-update` saga must be the only entity that can set an instance's VMM IDs (active and target) to NULL. Presently, however, the `instance-start` and `instance-migrate` sagas may also do this when they unwind. This is a requirement to avoid situations where a failed `instance-update` saga cannot unwind, because the instance's generation number has changed while the saga was executing. We want to ensure that if a start or migrate request fails, the instance will appear to be in the correct post-state as soon as the relevant API call returns. In order to do this, without having to also guarantee that an instance update has occurred, we introduce a new VMM state, `SagaUnwound`, with the following rules: - It is legal to start or migrate an instance if the `active_propolis_id` or `destination_propolis_id` (respectively) is either `NULL` or refers to a VMM that’s in the `SagaUnwound` state (the new VMM ID directly replaces the old ID). - If an instance has an `active_propolis_id` in the `SagaUnwound` state, then the instance appears to be `Stopped`. - If an instance has a `destination_propolis_id` in the `SagaUnwound` state, nothing special happens–the instance’s state is still derived from its active VMM’s state. - The update saga treats `SagaUnwound` VMMs as identical to `Destroyed` VMMs for purposes of deciding whether to remove a VMM ID from an instance. This branch adds a new `VmmState::SagaUnwound` variant. The `SagaUnwound` state is an internal implementation detail that shouldn't be exposed to an operator or in the external API. Sled-agents will never report that a VMM is in this state. Instead, this state is set my the `instance-start` and `instance-migrate` sagas when they unwind. When determining the API instance state from an instance and active VMM query so that the `SagaUnwound` state is mapped to `Destroyed`. Closes #5848, which this replaces.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This branch is part of ongoing work on the
instance-updatesaga (seePR #5749); I've factored it out into a separate PR. This is largely
because this branch makes mechanical changes to a bunch of different
files that aren't directly related to the core change of #5749, and I'd
like to avoid the additional merge conflicts that are likely when these
changes remain un-merged for a long time.
As part of #5749, it is desirable to distinguish between why a VMM
record was marked as
Destroyed, in order to determine which saga isresponsible for cleaning up that VMM's resources. The
instance-updatesaga must be the only entity that can set an instance's VMM IDs (active
and target) to NULL. Presently, however, the
instance-startandinstance-migratesagas may also do this when they unwind. This is arequirement to avoid situations where a failed
instance-updatesagacannot unwind, because the instance's generation number has changed
while the saga was executing.
We want to ensure that if a start or migrate request fails, the instance
will appear to be in the correct post-state as soon as the relevant API
call returns. In order to do this, without having to also guarantee that
an instance update has occurred, we introduce a new VMM state,
SagaUnwound, with the following rules:active_propolis_idordestination_propolis_id(respectively) iseither
NULLor refers to a VMM that’s in theSagaUnwoundstate(the new VMM ID directly replaces the old ID).
active_propolis_idin theSagaUnwoundstate,then the instance appears to be
Stopped.destination_propolis_idin theSagaUnwoundstate, nothing special happens–the instance’s state is still derived
from its active VMM’s state.
SagaUnwoundVMMs as identical toDestroyedVMMs for purposes of deciding whether to remove a VMM ID from an
instance.
This branch introduces the
SagaUnwoundstate. TheSagaUnwoundstateis an internal implementation detail that shouldn't be exposed to an
operator or in the external API. Therefore, I've chosen to separate the
enums representing the external API
InstanceStatetype and theinternal (database)
InstanceState, so that we don't have to add anInstanceState::SagaUnwoundvariant to the public OpenAPI spec whichwill never actually be returned. This, unfortunately, requires touching
a bunch of code, but the change is mostly mechanical. I've also updated
the code for determining the external API instance state from an
instance and active VMM query so that the
SagaUnwoundstate is mappedto
Destroyedpublicly, and instances whose active VMMs areStoppedremain in the
Stoppingstate (rather thanStopped), until a newstart saga for that instance is actually able to execute.