Summary
While preparing for the demo on lab machine "sock" the other week, I tried tearing down the stack and building it up again to make sure that the setup stuff works as expected. With a bunch of help in chat, we found that there's nothing today that tears down a bunch of stuff, which causes other teardown scripts to fail. The summary (mostly thanks to @bnaecker) was that omicron-package uninstall should:
- delete IP interfaces created by Sled Agent (these seem to be prefixed with
net)
- delete all service zone VNICs, which are prefixed by
ox
- delete all guest VNICs, which look like
vopteX
- delete all xde devices. This could be a library call or shell out to
opteadm delete-xde.
- unload the xde driver, in the absence of an xde ioctl that does the opposite of
opteadm set-xde-underlay
It makes me a little nervous to have automation automatically remove stuff by prefix, especially a generic prefix like net, but I don't know enough about how this state is managed to suggest a safer approach.
How you know you've hit this
Most of this was found by trying to run destroy_virtual_hardware.sh until it succeeded. Until these issues are resolved, you'll see errors like this:
dladm delete-vnic net0
dladm: vnic deletion failed: link busy
+ RC=1
+ [[ 1 -eq 0 ]]
+ warn 'Failed to delete VNIC link net0'
+ echo -e '\e[1;31mFailed to delete VNIC link net0\e[0m'
Failed to delete VNIC link net0
Workarounds
While I was setting up the stack, I worked around these issues by:
- finding the IP interfaces and L2 devices that need to be removed with
ipadm and dladm show-link, respectively
- manually removing IP interfaces that start with
net using pfexec ipadm delete-if net0
- manually removing the "vopte" VNICs with
pfexec dladm delete-vnic vopte0
- manually removing the "opte" xde devices with
pfexec /opt/oxide/opte/bin/opteadm delete-xde opte0
- manually unloading the "opte" kernel module. Use
modinfo | grep xde to find its id (first column) and then pfexec modunload -i <id> to unload it.
note: if you leave the pfexec off the ipadm delete-if command, you may then run into illumos bug 14724 (since fixed, but "sock" has not been updated as of this writing).
Summary
While preparing for the demo on lab machine "sock" the other week, I tried tearing down the stack and building it up again to make sure that the setup stuff works as expected. With a bunch of help in chat, we found that there's nothing today that tears down a bunch of stuff, which causes other teardown scripts to fail. The summary (mostly thanks to @bnaecker) was that
omicron-package uninstallshould:net)oxvopteXopteadm delete-xde.opteadm set-xde-underlayIt makes me a little nervous to have automation automatically remove stuff by prefix, especially a generic prefix like
net, but I don't know enough about how this state is managed to suggest a safer approach.How you know you've hit this
Most of this was found by trying to run
destroy_virtual_hardware.shuntil it succeeded. Until these issues are resolved, you'll see errors like this:Workarounds
While I was setting up the stack, I worked around these issues by:
ipadmanddladm show-link, respectivelynetusingpfexec ipadm delete-if net0pfexec dladm delete-vnic vopte0pfexec /opt/oxide/opte/bin/opteadm delete-xde opte0modinfo | grep xdeto find its id (first column) and thenpfexec modunload -i <id>to unload it.note: if you leave the
pfexecoff theipadm delete-ifcommand, you may then run into illumos bug 14724 (since fixed, but "sock" has not been updated as of this writing).