[nexus] Reincarnate instances with SagaUnwound VMMs#6669
Merged
Conversation
Member
Author
|
When merging this, we should also be sure to merge #6658, since otherwise, |
since we print these in OMDB, it breaks the success cases expectorate tests to use unordered hashmaps...
i dont know whats wrong with me
gjcolombo
approved these changes
Sep 27, 2024
Co-authored-by: Greg Colombo <greg@oxidecomputer.com>
Member
Author
|
Well that's extremely spooky, it looks like this worked fine on commit 0b7f72e but then somehow broke on commit 8f89106: https://buildomat.eng.oxide.computer/wg/0/details/01J8T6F6B4TYVZVGS9NVY6RXJ8/m4ivC9CI7YNrcLE1S1dUTosEDmIl3bfax3fd4qNVIe7XiKua/01J8T6G2PKB9TADGAMV5DAR8R8 |
Member
Author
|
(also, it occurred to me that we probably want to make unwinding start sagas check if they should immediately kick the reincarnation task...) |
Member
Author
Member
Author
|
Aaaand it passes on my machine: I bet this is a race between periodic and explicit activations of the reincarnation task. Cool. |
0645a37 to
19f9f16
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When an
instance-startsaga unwinds, any VMM it created transitions tothe
SagaUnwoundstate. This causes the instance's effective state toappear as
Failedin the external API. PR #6503 added functionality toNexus to automatically restart instances that are in the
Failedstate("instance reincarnation"). However, the current instance-reincarnation
task will not automatically restart instances whose instance-start
sagas have unwound, because such instances are not actually in the
Failedstate from Nexus' perspective.This PR implements reincarnation for instances whose
instance-startsagas have failed. This is done by changing the
instance_reincarnationbackground task to query the database for instances which have
SagaUnwoundactive VMMs, and then runinstance-startsagas for themidentically to how it runs start sagas for
Failedinstances.I decided to perform two separate queries to list
Failedinstances andto list instances with
SagaUnwoundVMMs, because theSagaUnwoundquery requires a join with the
vmmtable, and I thought it was a bitnicer to be able to find
Failedinstances without having to do thejoin, and only do it when looking for
SagaUnwoundones. Also, havingtwo queries makes it easier to distinguish between
FailedandSagaUnwoundinstances in logging and the OMDB status output. Thisended up being implemented by adding a parameter to the
DataStore::find_reincarnatable_instancesmethod that indicates whichcategory of instances to select; I had previously considered making the
method on the
InstanceReincarnationstruct that finds instances andreincarnates them take the query as a
Fntaking the datastore andDataPageParamsand returning animpl FutureoutputtingResult<Vec<Instance>, ...>,but figuring out generic lifetimes for thepagination stuff was annoying enough that this felt like the simpler
choice.
Fixes #6638