MGS: Implement "/sp" endpoint to fetch state for all SPs by jgallagher · Pull Request #746 · oxidecomputer/omicron

jgallagher · 2022-03-10T18:49:48Z

There are three things I want to change after working on this PR:

I need to add some tests, at least of the /sp endpoint added here given its complexity. I'll do this before merging, but I think it's fine to start reviewing, particularly if there are any nontrivial changes that would affect those tests.
Error types are a bit of a mess and need some cleanup. I'll do this separately.
The way SPs are identified is all over the place; sometimes it's an SpIdentifier, sometimes its a SocketAddr, sometimes it's an ignition target (which itself is just an index, and is sometimes u8 and sometimes usize). I've been punting on this until we have a bit better understanding of how MGS is going to interact with the management network and track rack topology, but it's pretty unwieldy at this point. I'll do this separately too.

There's also an open question of whether the SP communications should be separated out from gateway entirely so that they can be shared with RSS. I'm strongly inclined to do this (and try to at least make progress on items 2 and 3 above in doing so) even if RSS ends up calling MGS instead of communicating directly, just from a crate cleanliness/organization point of view. Thoughts welcome!

jgallagher · 2022-03-11T22:22:04Z

A couple basic tests are in place as of 90c49ca. I'd like to add more tests of some of the more complicated cases (e.g., an unresponsive SP), but that will require some more work in the SP simulator. May do that as part of this PR or as a followup, unsure at the moment.

Exercises the `/sp` endpoint to get state of all SPs.

jgallagher · 2022-03-16T17:03:15Z

Force pushed to account for #770

ahl

great stuff!

ahl · 2022-03-16T20:43:25Z

+                // TODO we're dropping the error on the floor here - how should
+                // we handle it? This is an SP that we actively failed to
+                // communicate with somehow, which isn't the same as
+                // "unresponsive". Should we fail the entire request? That's how


In what situations might we hit this error?

something screwed up with the network configuration i.e. such that we get an error from the OS with an improper VLAN tag or something

a response from the SP that indicates an error... in which case that SP is in a weird state to be able to respond with an error but not with the simplest kind of message it might reasonably answer

Anything else?

The difficulty in answering this question is why I want to do some error cleanup! I can think of at least one other case that has to be handled, although in practice I don't expect to ever see it absent some horrible deployment mismatch nightmare: the SP sends a non-error response of a type that doesn't make sense (e.g., we ask it for its state and it responds with "here's a list of my components").

jgallagher requested a review from ahl March 10, 2022 18:49

jgallagher added 3 commits March 16, 2022 13:01

MGS: Implement "/sp" endpoint to fetch state for all SPs

16ae6f8

MGS: Add initial integration tests using simulated SPs

c708a62

Exercises the `/sp` endpoint to get state of all SPs.

MGS bulk SP state: add test of unresponsive SP

bf114de

jgallagher force-pushed the mgs-bulk-sp-state branch from 0424cb3 to bf114de Compare March 16, 2022 17:02

jgallagher mentioned this pull request Mar 16, 2022

MGS: Start gateway-sp-comms crate with a ManagementSwitch #777

Merged

ahl reviewed Mar 17, 2022

View reviewed changes

updates from PR review

36aab2a

jgallagher merged commit 4e60f4f into main Mar 17, 2022

jgallagher deleted the mgs-bulk-sp-state branch March 17, 2022 17:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MGS: Implement "/sp" endpoint to fetch state for all SPs#746

MGS: Implement "/sp" endpoint to fetch state for all SPs#746
jgallagher merged 4 commits into
mainfrom
mgs-bulk-sp-state

jgallagher commented Mar 10, 2022

Uh oh!

jgallagher commented Mar 11, 2022

Uh oh!

jgallagher commented Mar 16, 2022

Uh oh!

ahl left a comment

Uh oh!

ahl Mar 16, 2022

Uh oh!

jgallagher Mar 17, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jgallagher commented Mar 10, 2022

Uh oh!

jgallagher commented Mar 11, 2022

Uh oh!

jgallagher commented Mar 16, 2022

Uh oh!

ahl left a comment

Choose a reason for hiding this comment

Uh oh!

ahl Mar 16, 2022

Choose a reason for hiding this comment

Uh oh!

jgallagher Mar 17, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants