Add code for creating, splitting, combining a RackSecret by andrewjstone · Pull Request #429 · oxidecomputer/omicron

andrewjstone · 2021-11-24T19:18:22Z

A new RackSecret type used to implement the "Trust Quorum" functionality of
the rack is provided in this commit. It uses Shamir secret sharing with Feldman
verification. Feldman verification was chosen for simplicity as it doesn't
require creating blinding shares and distributing those along with the actual
shares as with Pederson verification. We can always change this later.

The NIST P-256 curve was chosen somewhat arbitrarily, although it is a popular
curve and works with Shamir secret sharing without any extra mechanisms required
by some other curves.

A very simple versioned protocol will be created to use this functionality over
SPDM channels. The use of a P-256 curve, Feldman verification, and bincode for
serialization will serve as version 1. It's unlikely there will be a need to
change this in the near future. Upgrade can be supported by transitioning to a
later version once all members of the quorum support the latest version of the
code.

A new `RackSecret` type used to implement the "Trust Quorum" functionality of the rack is provided in this commit. It uses Shamir secret sharing with Feldman verification. Feldman verification was chosen for simplicity as it doesn't require creating blinding shares and distributing those along with the actual shares as with Pederson verification. We can always change this later. The NIST P-256 curve was chosen somewhat arbitrarily, although it is a popular curve and works with Shamir secret sharing without any extra mechanisms required by some other curves. A very simple versioned protocol will be created to use this functionality over SPDM channels. The use of a P-256 curve, Feldman verification, and bincode for serialization will serve as version 1. It's unlikely there will be a need to change this in the near future. Upgrade can be supported by transitioning to a later version once all members of the quorum support the latest version of the code.

smklein

Looks like a good start! @flihp , @kc8apf , have we taken a look at the security-critical crates being added as dependencies here (p256, rand, vsss-rs)? Are we happy with them, or should we track that some form of audit will be necessary?

In order to allow for encrypted storage on individual sleds without the need for a user to type a password at bootup, we utilize secret sharing across sleds, where a threshold number of sleds need to communicate in order to generate a `rack secret`. This rack secret can then be used to derive local encryption keys from individual sleds. We therefore provide the ability to prevent an attacker from stealing a subset of sleds or storage devices and obtaining any data. In fact, the control plane software does not even boot until the rack secret is reconstructed and the protected storage unlocked. There are quite a few moving parts required in order to implement a trust quorum, some of which involve the service processor and hardware root of trust. This commit only implements the part of the trust quorum responsible for retrieiving existing key shares over an unfinished SPDM channel. It runs entirely on the host machine as part of the sled-agent. The code builds upon the multicast discovery code in #404, the SPDM negotiation code in #407 and the secret sharing code in #429. In the "normal" lifetime of an Oxide rack, a rack secret will be generated upon initialization of the new rack by the customer. The shares will then be destributed over SPDM channels to individual sleds such that they can be retrieved and combined at a later time when an individual sled or the entire rack reboots. The initial generation and distribution of shares is *not* part of this commit. We fake rack initialization through the completely insecure use of a configuration file provided as part of the `omicron-package` install that contains all key shares. The configuration file disables the trust quorum by default, so that the sled-agent continues to run on a single node. When enabled, share retrieval attempts will begin and when a quorum of shares are received, the rack secret will be reconstructed, and the rest of the control plane will begin to boot. In order for this to work, the user also has to edit the config file to ensure that a different `sled_index` (which points to a given unique share) exists in each config file, and then the sled-agent must be restarted with `svcadm restart sled-agent`. The included config file only includes shares for 2 sleds, but a new one can be generated with the provided `gen_trust_quorum_config` program. Lastly, the location of the config file is given in the sled-agent smf file and passed through as `rack_secret_dir` in the `BootstrapConfig` struct. The SPDM protocol is run over a 2-byte size header framed transport operating over a TCP stream. We generate a client and server to initialize this transport, perform SPDM negotiation, and then begin share retrieval. As noted in #407, only the negotiation phase of the SPDM protocol is currently implemented, and so we simply return the TCP based transport when negotiation completes, and pretend for now that we are operating over a secure channel. This allows us to test out the end-to-end behavior before we have a production ready SPDM implementation integrated. This commit also makes a small change to the SPDM transport to provide for timeouts on `send` and `recv` operations, and no longer requires passing a logger to each call of `recv`.

In order to allow for encrypted storage on individual sleds without the need for a user to type a password at boot, we utilize secret sharing across sleds, where a threshold number of sleds need to communicate in order to generate a `rack secret`. This rack secret can then be used to derive local encryption keys for individual sleds. We therefore provide the ability to prevent an attacker from stealing a subset of sleds or storage devices and obtaining any data. In fact, the control plane software does not even boot until the rack secret is reconstructed and the protected storage unlocked. There are quite a few moving parts required in order to implement a trust quorum, some of which involve the service processor and hardware root of trust. This commit only implements the part of the trust quorum responsible for retrieving existing key shares over an unfinished SPDM channel. It runs entirely on the host machine as part of the sled-agent. The code builds upon the multicast discovery code in #404, the SPDM negotiation code in #407 and the secret sharing code in #429. In the "normal" lifetime of an Oxide rack, a rack secret will be generated upon initialization of the new rack by the customer. The shares will then be distributed over SPDM channels to individual sleds such that they can be retrieved and combined at a later time when an individual sled or the entire rack reboots. The initial generation and distribution of shares is *not* part of this commit. Instead shares are individually distributed along with metadata as a `ShareDistribution` stored in a `share.json` file in the `sled_agent/pkg` directory under the install directory configured for `omicron-package install`. Share generation must be done manually now, but a follow up commit is coming for a deployment system that will generate the rack secret and distribute the shares along with the install of omicron. If the `share.json` file is not present, the server operates in single-node mode, and does not try to form a a trust quorum. This is behavior required for current development backwards compatibility and will eventually be removed. The SPDM protocol is run over a 2-byte size header framed transport operating over a TCP stream. We generate a client and server to initialize this transport, perform SPDM negotiation, and then begin share retrieval. As noted in #407, only the negotiation phase of the SPDM protocol is currently implemented, and so we simply return the TCP based transport when negotiation completes, and pretend for now that we are operating over a secure channel. This allows us to test out the end-to-end behavior before we have a production ready SPDM implementation integrated. This commit also makes a small change to the SPDM transport to provide for timeouts on `send` and `recv` operations, and no longer requires passing a logger to each call of `recv`.

andrewjstone requested review from flihp, mx-shift and smklein November 24, 2021 19:18

andrewjstone force-pushed the secret-sharing branch from dd5ad12 to 1238ab1 Compare November 24, 2021 19:18

andrewjstone force-pushed the secret-sharing branch from 1238ab1 to 7aad8e1 Compare November 24, 2021 19:19

fix formatting

17d5f80

smklein reviewed Nov 29, 2021

View reviewed changes

smklein reviewed Nov 30, 2021

View reviewed changes

Comment thread sled-agent/src/bootstrap/rack_secret.rs

andrewjstone and others added 5 commits November 30, 2021 16:28

Merge branch 'main' into secret-sharing

e17be8d

Fixes for Sean's review

e825551

Whoops, missed one assert

c4b1d06

Merge branch 'main' into secret-sharing

b98c5ca

Merge branch 'main' into secret-sharing

3847b08

smklein approved these changes Dec 1, 2021

View reviewed changes

Comment thread sled-agent/Cargo.toml Outdated

Comment thread sled-agent/src/bootstrap/rack_secret.rs

Merge branch 'main' into secret-sharing

a0c6ad3

andrewjstone merged commit e5bfc81 into main Dec 1, 2021

andrewjstone deleted the secret-sharing branch December 1, 2021 19:02

andrewjstone mentioned this pull request Dec 6, 2021

Add initial trust quorum support #487

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add code for creating, splitting, combining a RackSecret#429

Add code for creating, splitting, combining a RackSecret#429
andrewjstone merged 8 commits into
mainfrom
secret-sharing

andrewjstone commented Nov 24, 2021 •

edited

Loading

Uh oh!

smklein left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

andrewjstone commented Nov 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

smklein left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

andrewjstone commented Nov 24, 2021 •

edited

Loading