Skip to content

MGS: Implement "/sp" endpoint to fetch state for all SPs#746

Merged
jgallagher merged 4 commits into
mainfrom
mgs-bulk-sp-state
Mar 17, 2022
Merged

MGS: Implement "/sp" endpoint to fetch state for all SPs#746
jgallagher merged 4 commits into
mainfrom
mgs-bulk-sp-state

Conversation

@jgallagher

Copy link
Copy Markdown
Contributor

There are three things I want to change after working on this PR:

  1. I need to add some tests, at least of the /sp endpoint added here given its complexity. I'll do this before merging, but I think it's fine to start reviewing, particularly if there are any nontrivial changes that would affect those tests.
  2. Error types are a bit of a mess and need some cleanup. I'll do this separately.
  3. The way SPs are identified is all over the place; sometimes it's an SpIdentifier, sometimes its a SocketAddr, sometimes it's an ignition target (which itself is just an index, and is sometimes u8 and sometimes usize). I've been punting on this until we have a bit better understanding of how MGS is going to interact with the management network and track rack topology, but it's pretty unwieldy at this point. I'll do this separately too.

There's also an open question of whether the SP communications should be separated out from gateway entirely so that they can be shared with RSS. I'm strongly inclined to do this (and try to at least make progress on items 2 and 3 above in doing so) even if RSS ends up calling MGS instead of communicating directly, just from a crate cleanliness/organization point of view. Thoughts welcome!

@jgallagher jgallagher requested a review from ahl March 10, 2022 18:49
@jgallagher

Copy link
Copy Markdown
Contributor Author

A couple basic tests are in place as of 90c49ca. I'd like to add more tests of some of the more complicated cases (e.g., an unresponsive SP), but that will require some more work in the SP simulator. May do that as part of this PR or as a followup, unsure at the moment.

@jgallagher

Copy link
Copy Markdown
Contributor Author

Force pushed to account for #770

@ahl ahl left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great stuff!

// TODO we're dropping the error on the floor here - how should
// we handle it? This is an SP that we actively failed to
// communicate with somehow, which isn't the same as
// "unresponsive". Should we fail the entire request? That's how

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what situations might we hit this error?

  • something screwed up with the network configuration i.e. such that we get an error from the OS with an improper VLAN tag or something
  • a response from the SP that indicates an error... in which case that SP is in a weird state to be able to respond with an error but not with the simplest kind of message it might reasonably answer

Anything else?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difficulty in answering this question is why I want to do some error cleanup! I can think of at least one other case that has to be handled, although in practice I don't expect to ever see it absent some horrible deployment mismatch nightmare: the SP sends a non-error response of a type that doesn't make sense (e.g., we ask it for its state and it responds with "here's a list of my components").

Comment thread gateway/src/sp_comms/bulk_state_get.rs Outdated
Comment thread gateway/src/sp_comms/bulk_state_get.rs Outdated
Comment thread gateway/src/sp_comms/bulk_state_get.rs
Comment thread gateway/src/sp_comms/bulk_state_get.rs
Comment thread gateway/src/sp_comms/bulk_state_get.rs
Comment thread gateway/src/sp_comms/bulk_state_get.rs
@jgallagher jgallagher merged commit 4e60f4f into main Mar 17, 2022
@jgallagher jgallagher deleted the mgs-bulk-sp-state branch March 17, 2022 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants