Failed instances should be allowed to stop and restart

Currently Nexus accepts no attempts to change the state of a Failed instance: https://github.com/oxidecomputer/omicron/blob/ee0aac053f178980d0c2d0e6c40bea38688961f9/nexus/src/app/instance.rs#L395-L416

There are plenty of reasons an instance could move to the Failed state (e.g. a failure to start the VM in Propolis, a heartbeat failure like those discussed in #2727, etc.). A VM user needs to be able to stop and attempt to restart a failed instance.

(Note that, on the Propolis end, once an instance has failed, it can't be restarted--the Propolis zone needs to be destroyed and recreated.)

	fn check_runtime_change_allowed(
	&self,
	runtime: &nexus::InstanceRuntimeState,
	) -> Result<(), Error> {
	// Users are allowed to request a start or stop even if the instance is
	// already in the desired state (or moving to it), and we will issue a
	// request to the SA to make the state change in these cases in case the
	// runtime state we saw here was stale. However, users are not allowed
	// to change the state of an instance that's migrating, failed or
	// destroyed.
	let allowed = match runtime.run_state {
	InstanceState::Creating => true,
	InstanceState::Starting => true,
	InstanceState::Running => true,
	InstanceState::Stopping => true,
	InstanceState::Stopped => true,
	InstanceState::Rebooting => true,
	InstanceState::Migrating => false,
	InstanceState::Repairing => false,
	InstanceState::Failed => false,
	InstanceState::Destroyed => false,
	};

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed instances should be allowed to stop and restart #2825

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Failed instances should be allowed to stop and restart #2825

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions