Skip to content

Rework MGS internals for clarity#1583

Merged
jgallagher merged 1 commit into
mainfrom
mgs-cleanup-2
Aug 17, 2022
Merged

Rework MGS internals for clarity#1583
jgallagher merged 1 commit into
mainfrom
mgs-cleanup-2

Conversation

@jgallagher

Copy link
Copy Markdown
Contributor

This is the followup PR to #1579, reworking MGS proper to use the SingleSp introduced in that PR for faux-mgs. The changes include:

  • Removal of RecvHandler and all its supporting machinery; instead of
    a monster tokio task reponsible for recv'ing on all sockets, pairing
    up request/response IDs, etc., we spawn a SingleSp task for each
    switch port that manages all communication on that port's socket.
  • Removal of the timeout option to most SP calls. SingleSp already
    has inherent (configured) timeouts for retries. At the HTTP client
    level, the client can already establish its own timeouts (e.g., in the
    event the network connection between client and server goes down); we
    don't need separate timeouts for SP communications (which the client
    has no reason to know how to set).
  • The serial console websocket handling has moved from
    gateway-sp-comms to gateway.

@andrewjstone andrewjstone left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @jgallagher. The organization changes make this a lot cleaner IMO, and having each SP handled in its own task likely provides better scalability/performance.


#[derive(Debug)]
pub struct AttachedSerialConsole {
key: u64,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of key? Are multiple serial attachments to the same SP allowed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not simultaneously; the purpose of key here is to distinguish "detach any currently-attached serial console client" (i.e., serial_console_detach() above, which sends a key of None) from "detach this currently-attached serial console client (i.e., AttachedSerialConsoleSend::send() below, which sends a key of Some(self.key)). This is (maybe only partially?) explained below:

// The associated value is the connection key; if Some(_), only detach if
// the currently-attached key number matches. If None, detach any current
// connection. These correspond to "detach the current session" (performed
// automatically when a connection is closed) and "force-detach any session"
// (performed by a user).

In hindsight I should expand this: ultimately this is to avoid a race condition like this:

  1. Client A attaches to the serial console.
  2. Client B tries to attach and fails.
  3. Client B sends a "detach any existing clients", and then successfully attaches.
  4. Client A drops their connection, and sends a "detach my current connection".

With this ordering, we don't want step 4 to detach client B from step 3; the key makes this work because the key sent in step 4 won't match the key established in step 3, so the detach will be a noop (the fact that the key is different means client A's session has already been detached).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this makes perfect sense. I somewhat gathered that after writing this question from another comment in the code where it talked about attaching after detach, but wasn't 100% sure so left my question.

Base automatically changed from mgs-cleanup-1 to main August 17, 2022 16:36
The changes include:

* Removal of `RecvHandler` and all its supporting machinery; instead of
  a monster tokio task reponsible for recv'ing on all sockets, pairing
  up request/response IDs, etc., we spawn a `SingleSp` task for each
  switch port that manages all communication on that port's socket.
* Removal of the `timeout` option to most SP calls. `SingleSp` already
  has inherent (configured) timeouts for retries. At the HTTP client
  level, the client can already establish its own timeouts (e.g., in the
  event the network connection between client and server goes down); we
  don't need separate timeouts for SP communications (which the client
  has no reason to know how to set).
* The serial console websocket handling has moved from
  `gateway-sp-comms` to `gateway`.
@jgallagher jgallagher enabled auto-merge (squash) August 17, 2022 20:26
@jgallagher jgallagher merged commit a70c681 into main Aug 17, 2022
@jgallagher jgallagher deleted the mgs-cleanup-2 branch August 17, 2022 21:05
leftwo pushed a commit that referenced this pull request Dec 4, 2024
Skip jobs when reinitializing to `Faulted` (#1583)
Clear `repair_check_deadline` if repair is successfully started (#1581)
Update rust-version reqs to reflect reality (#1580)

Propolis:
60886290 update nvme-trace.d to match current probe definitions (#821)
8e5693bf Fix clippy lints for Rust 1.83
leftwo added a commit that referenced this pull request Dec 4, 2024
Crucible:
Skip jobs when reinitializing to `Faulted` (#1583)
Clear `repair_check_deadline` if repair is successfully started (#1581)
Update rust-version reqs to reflect reality (#1580)

Propolis:
Update nvme-trace.d to match current probe definitions (#821) 
Fix clippy lints for Rust 1.83
PHD: use stty to widen the effective terminal for Linux guests (#818)

Co-authored-by: Alan Hanson <alan@oxide.computer>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants