[update] add missing sleds to contact_support checks#10476
Conversation
| /// Build a map of version strings to the number of components on that | ||
| /// version | ||
| async fn component_version_counts( | ||
| async fn get_internal_update_status( |
There was a problem hiding this comment.
Most of the work was already done here, I just extracted the bit that was creating internal_views::UpdateStatus and used that to determine missing sleds.
sunshowers
left a comment
There was a problem hiding this comment.
Looks good overall! Just a few questions and comments.
| .iter() | ||
| .filter(|sled| { | ||
| // `unknown()` returns the represenation of the update status | ||
| // for a given sled ID that isn't present in inventory. |
There was a problem hiding this comment.
Is this case also hit when a sled is present in inventory but there's no last reconciliation status?
There was a problem hiding this comment.
yep
omicron/nexus/types/src/internal_api/views.rs
Lines 1010 to 1012 in bca885a
I can update the comment to explain this
There was a problem hiding this comment.
That would be great -- thanks!
| **sled | ||
| == internal_views::SledAgentUpdateStatus::unknown( | ||
| sled.sled_id, | ||
| ) | ||
| }) |
There was a problem hiding this comment.
Hmm, looking at SledAgentUpdateStatus, the unknown state isn't recorded as an enum variant. This is pre-existing but I'm a little worried about the fragility of things like string equality here:
omicron/nexus/types/src/internal_api/views.rs
Lines 677 to 681 in 2fb3663
There was a problem hiding this comment.
I fear that this fix may be a bit more involved than it seems at a first glance #10271 (comment)
Do you mind if I open up a follow up issue for this work?
| } | ||
| }; | ||
|
|
||
| let missing_sleds: BTreeSet<SledUuid> = self |
There was a problem hiding this comment.
Does omdb need to show this information? Seems like for support it would be better than grepping through logs.
There was a problem hiding this comment.
Yeah, that'd be really nice. There was a discussion about that here -> #9412 (comment)
I think this work may need its own issue though. Thoughts?
There was a problem hiding this comment.
Yeah, a followup sounds good.
| let expected_sleds = self | ||
| .datastore() | ||
| .sled_list_all_batched( | ||
| opctx, | ||
| SledFilter::SpsUpdatedByReconfigurator, | ||
| ) | ||
| .await? |
There was a problem hiding this comment.
Hmm so this will do a live db query, but inventory is cached. Does this mean that when a sled is newly commissioned, contact_support might be true for a short period of time?
There was a problem hiding this comment.
Yeah, probably. This is also true for other checks in various circumstances. I'll add these caveats to the support runbooks. It's worth mentioning that users would only hit this case if they wish to perform an update or check the update status immediately after a sled is newly commissioned, which is unlikely?
There was a problem hiding this comment.
Yeah, that sounds good. I do feel like in the limit this is worth fixing, but it's pretty far down the list.
| async fn component_version_counts( | ||
| &self, | ||
| status: internal_views::UpdateStatus, | ||
| ) -> Result<BTreeMap<String, usize>, Error> { |
There was a problem hiding this comment.
Two notes:
- This doesn't need to either be
asyncor take&self, I think. - Small nit: thoughts on reworking this to take a reference to
UpdateStatus?
There was a problem hiding this comment.
Yeah, you're right. I extracted the bits that I needed but didn't check the signature to see if it still made sense. I'll fix this
| .sleds | ||
| .iter() | ||
| .filter(|sled| { | ||
| // `unknown()` returns the represenation of the update status |
There was a problem hiding this comment.
nit:
| // `unknown()` returns the represenation of the update status | |
| // `unknown()` returns the representation of the update status |
karencfv
left a comment
There was a problem hiding this comment.
Thanks for taking a look! I think I've addressed all your comments, let me know if I missed anything :)
| .iter() | ||
| .filter(|sled| { | ||
| // `unknown()` returns the represenation of the update status | ||
| // for a given sled ID that isn't present in inventory. |
There was a problem hiding this comment.
yep
omicron/nexus/types/src/internal_api/views.rs
Lines 1010 to 1012 in bca885a
I can update the comment to explain this
| } | ||
| }; | ||
|
|
||
| let missing_sleds: BTreeSet<SledUuid> = self |
There was a problem hiding this comment.
Yeah, that'd be really nice. There was a discussion about that here -> #9412 (comment)
I think this work may need its own issue though. Thoughts?
| **sled | ||
| == internal_views::SledAgentUpdateStatus::unknown( | ||
| sled.sled_id, | ||
| ) | ||
| }) |
There was a problem hiding this comment.
I fear that this fix may be a bit more involved than it seems at a first glance #10271 (comment)
Do you mind if I open up a follow up issue for this work?
| async fn component_version_counts( | ||
| &self, | ||
| status: internal_views::UpdateStatus, | ||
| ) -> Result<BTreeMap<String, usize>, Error> { |
There was a problem hiding this comment.
Yeah, you're right. I extracted the bits that I needed but didn't check the signature to see if it still made sense. I'll fix this
Follow up to #10271
Closes #4745