Transient failures in instance state monitor cause instances to be lost at sea

Sled agent spawns a state monitor task per instance that calls Propolis's `instance_state_monitor` endpoint, watches for VM state transitions, processes them, and relays any resulting instance state changes to Nexus. Any error in `Instance::monitor_state_task` or any of its callees bubbles up through the task and causes it to exit. After this, nothing monitors Propolis for subsequent state changes or notifies Nexus if they occur.

The DNS resolution error in #2726 brought this to my immediate attention, but any failure in the monitoring task will do the trick; another not-too-esoteric one is a Propolis panic that causes the call to `instance_state_monitor` to fail.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transient failures in instance state monitor cause instances to be lost at sea #2727

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Transient failures in instance state monitor cause instances to be lost at sea #2727

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions