This function is used to collect information about zpools on the system, which the sled agent then manages as storage devices for things like the databases or Crucible. This collects some of the fields about each pool, such as the total size, or the number of allocated bytes. However, for faulted devices, these fields are not present. Specifically, we'd see:
bnaecker@feldspar : ~/omicron $ zpool list -Hpo name,size,allocated,free,health oxp_d462a7f7-b628-40fe-80ff-4e4189e2d62b
oxp_d462a7f7-b628-40fe-80ff-4e4189e2d62b - - - FAULTED
bnaecker@feldspar : ~/omicron $
The code in that function attempts to parse the string - as a number, which obviously fails. This can prevent the sled agent from making any further progress, which we can see in the log as:
{"msg":"failed to start sled agent","v":0,"name":"SledAgent","level":40,"time":"2022-08-11T18:17:33.839069674Z","hostname":"feldspar","pid":2455,"component":"BootstrapAgentRssHandler","error":"ServerFailure(\"Sled agent request failed: Error starting sled agent: Could not start sled agent server: Error managing storage: Failed to get info for zpool 'oxp_d462a7f7-b628-40fe-80ff-4e4189e2d62b': Failed to parse output: Failed to parse field 'size': invalid digit found in string\")"}
We probably want to handle this more gracefully, not trying to parse out data if the pool is faulted.
This function is used to collect information about zpools on the system, which the sled agent then manages as storage devices for things like the databases or Crucible. This collects some of the fields about each pool, such as the total size, or the number of allocated bytes. However, for faulted devices, these fields are not present. Specifically, we'd see:
The code in that function attempts to parse the string
-as a number, which obviously fails. This can prevent the sled agent from making any further progress, which we can see in the log as:We probably want to handle this more gracefully, not trying to parse out data if the pool is faulted.