The downstairs were wiped out when the sled-agent crashed and restarted:
09:00:01.266Z INFO SledAgent (PortManager): Mapping virtual NIC to physical host
mapping = SetVirtualNetworkInterfaceHost { virtual_ip: 172.30.0.5, virtual_mac: MacAddr(MacAddr6([168, 64, 37, 243, 152, 219])), physical_host_ip: fd00:1122
:3344:10a::1, vni: Vni(10225803) }
09:00:01.267Z INFO SledAgent (dropshot (SledAgent)): request completed
local_addr = [fd00:1122:3344:107::1]:12345
method = PUT
remote_addr = [fd00:1122:3344:102::4]:43840
req_id = bcc4fef9-789c-4e73-a1f7-2f249ecde50c
response_code = 204
uri = /v2p/b7e955a4-36e3-4d74-ae3b-ad053ae8097b
09:00:01.816Z INFO SledAgent (InstanceManager): Adding address: Static(V6(Ipv6Network { addr: fd00:1122:3344:107::2a, prefix: 64 }))
instance_id = f1e6ed32-cb42-4b71-a7ac-893ac46467f1
zone = oxz_propolis-server_d00f74ec-80ea-4419-80e9-ec9b3bbf83f4
09:00:02.006Z ERRO SledAgent (InstanceManager): instance setup failed: Err(ZoneEnsureAddress(EnsureAddressError(EnsureAddressError { zone: "oxz_propolis-server_
d00f74ec-80ea-4419-80e9-ec9b3bbf83f4", request: Static(V6(Ipv6Network { addr: fd00:1122:3344:107::2a, prefix: 64 })), name: AddrObject { interface: "oxControlIn
stance8", name: "omicron6" }, err: Zone execution error: Command [/usr/sbin/zlogin oxz_propolis-server_d00f74ec-80ea-4419-80e9-ec9b3bbf83f4 /usr/sbin/ipadm crea
te-addr -t -T addrconf oxControlInstance8/ll] executed and failed with status: exit status: 1 stdout:
stderr: ipadm: Could not create address: Addrconf already in progress
Caused by:
Command [/usr/sbin/zlogin oxz_propolis-server_d00f74ec-80ea-4419-80e9-ec9b3bbf83f4 /usr/sbin/ipadm create-addr -t -T addrconf oxControlInstance8/ll] exe
cuted and failed with status: exit status: 1 stdout:
stderr: ipadm: Could not create address: Addrconf already in progress })))
instance_id = f1e6ed32-cb42-4b71-a7ac-893ac46467f1
09:00:02.007Z INFO SledAgent (InstanceManager): Publishing instance state update to Nexus
instance_id = f1e6ed32-cb42-4b71-a7ac-893ac46467f1
state = InstanceRuntimeState { run_state: Failed, sled_id: 7230a95e-44ac-42ef-8dbd-1183d39193c7, propolis_id: d00f74ec-80ea-4419-80e9-ec9b3bbf83f4, dst_prop
olis_id: None, propolis_addr: Some([fd00:1122:3344:107::2a]:12400), migration_id: None, propolis_gen: Generation(1), ncpus: InstanceCpuCount(4), memory: ByteCou
nt(2147483648), hostname: "web-instance-2", gen: Generation(3), time_updated: 2023-06-29T09:00:02.006892393Z }
09:00:02.052Z INFO SledAgent (dropshot (SledAgent)): request completed
error_message_external = Internal Server Error
error_message_internal = Failed to create address Static(V6(Ipv6Network { addr: fd00:1122:3344:107::2a, prefix: 64 })) with name oxControlInstance8/omicron6
in oxz_propolis-server_d00f74ec-80ea-4419-80e9-ec9b3bbf83f4: Zone execution error: Command [/usr/sbin/zlogin oxz_propolis-server_d00f74ec-80ea-4419-80e9-ec9b3b
bf83f4 /usr/sbin/ipadm create-addr -t -T addrconf oxControlInstance8/ll] executed and failed with status: exit status: 1 stdout: \n stderr: ipadm: Could not c
reate address: Addrconf already in progress
local_addr = [fd00:1122:3344:107::1]:12345
method = PUT
remote_addr = [fd00:1122:3344:102::4]:38064
req_id = 6206c886-bea7-4e1f-8126-54c12ea873e0
response_code = 500
uri = /instances/f1e6ed32-cb42-4b71-a7ac-893ac46467f1/state
09:00:02.111Z INFO SledAgent (dropshot (SledAgent)): accepted connection
local_addr = [fd00:1122:3344:107::1]:12345
remote_addr = [fd00:1122:3344:102::4]:41840
09:00:02.111Z WARN SledAgent (InstanceManager): Halting and removing zone: oxz_propolis-server_d00f74ec-80ea-4419-80e9-ec9b3bbf83f4
instance_id = f1e6ed32-cb42-4b71-a7ac-893ac46467f1
thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: AdmError { op: Uninstall, zone: "oxz_propolis-server_d00f74ec-80ea-4419-
80e9-ec9b3bbf83f4", err: CommandOutput(CommandOutputError("exit code 1\nstdout:\n\nstderr:\nzoneadm: zone 'oxz_propolis-server_d00f74ec-80ea-4419-80e9-ec9b3bbf8
3f4': uninstall operation is invalid for shutting_down zones.")) }', sled-agent/src/instance.rs:535:64
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
[ Jun 29 09:00:10 Stopping because all processes in service exited. ]
[ Jun 29 09:00:10 Executing stop method (:kill). ]
[ Jun 29 09:00:10 Executing start method ("ctrun -l child -o noorphan,regent /opt/oxide/sled-agent/sled-agent run /opt/oxide/sled-agent/pkg/config.toml &"). ]
[ Jun 29 09:00:10 Method "start" exited with status 0. ]
note: configured to log to "/dev/stdout"
09:00:12.732Z INFO SledAgent: Starting mg-ddm service
09:00:12.798Z INFO SledAgent: Importing mg-ddm service
path = /opt/oxide/mg-ddm/pkg/ddm/manifest.xml
09:00:13.023Z INFO SledAgent: Setting mg-ddm interfaces
interfaces = ("cxgbe0/ll" "cxgbe1/ll")
09:00:13.044Z INFO SledAgent: Enabling mg-ddm service
09:00:13.070Z INFO SledAgent: setting up bootstrap agent server
09:00:13.166Z INFO SledAgent: Ensuring zfs key directory exists
path = /var/run/oxide/
09:00:13.582Z INFO SledAgent: Sending prefix to ddmd for advertisement
DdmAdminClient = [::1]:8000
prefix = Ipv6Prefix { addr: fdb0:a840:2504:3d5::, len: 64 }
09:00:13.688Z WARN SledAgent: Deleting existing zone
zone_name = oxz_ntp
09:00:13.703Z WARN SledAgent: Deleting existing zone
zone_name = oxz_crucible_oxp_bd5d7d9f-58ca-4350-9083-6a92a6155a65
09:00:13.714Z WARN SledAgent: Deleting existing zone
zone_name = oxz_crucible_oxp_47d274ce-f4cb-4bc8-990a-b1460bd918c6
09:00:13.741Z WARN SledAgent: Deleting existing zone
zone_name = oxz_crucible_oxp_0cf8b90b-1143-4119-9012-1188c92036f2
09:00:13.756Z WARN SledAgent: Deleting existing zone
zone_name = oxz_crucible_oxp_7c1992a0-3f17-4672-b141-61ccab131c16
09:00:13.773Z WARN SledAgent: Deleting existing zone
zone_name = oxz_crucible_oxp_b155e4f4-facd-4a7b-a464-b965fc8e8cf5
09:00:13.787Z WARN SledAgent: Deleting existing zone
...
Aside from the delete-all-zones behavior (which is already being worked on), we probably also need to deal with the issue that the sled-agent crashed in face of an incompatible state error - "uninstall operation is invalid for shutting_down zones". The error handling can be less catastrophic.
Aside from the delete-all-zones behavior (which is already being worked on), we probably also need to deal with the issue that the sled-agent crashed in face of an incompatible state error - "uninstall operation is invalid for shutting_down zones". The error handling can be less catastrophic.
Originally posted by @askfongjojo in #3451 (comment)