Add remote signing to substrate client

The following proposes the addition of remote signing functionality within the substrate client. 

# Context

Security of Proof of Stake networks lie within the hands of validators - without the security these entities provide, the whole system falls apart. The responsibility of a validator is to operate stable, reliable, consistent, and secure operations of their nodes. This responsibility also includes managing their signing keys, keys that let the network know _they_ were the ones that verified that the activity they put on the network is non-byzantine.

As a validator, the current paradigm of storing hot session keys in the client leaves much to be desired in terms of security. Although session keys cannot lead to direct access of funds, a compromise of the validator host (and the session keys within it) can lead to a complete loss of funds for a validator and the funds of those nominating them. Furthermore, there is a greedy incenctive to compromise these keys, as up to 10% of the slash can get rewarded to those who report it. While key rotation helps mitigate this to an extent, a more elegant solution of key storage and signing will be required in the long run.

Separating out the storage and signing interface of session keys from the validator host client would allow validators to create more robust and flexible operations, while providing additional layers of defense against possible attack vectors. A full compromise of the validator host shouldn't enable conditions where the validator can be slashed. Separating out the storage of session keys would mean adding the ability to have a remote signing interface, which gives a flexible means of having a remote signing server - one which ideally has double signing protection and HSM, TEE, Ledger, and TSS support. This addition increases the cost of compromising validator operations, something that creates a more resilient and secure network in the long run. 

## Remote Signing Server

The following proposes the approaches of one remote signing server, although the interfaces exposed by the substrate client should allow for multiple implementations to exist. The signing server proposed here would live as a rust module in a separate repository - these considerations are for reference and context.

A remote signing server should be flexible to account for a diversity of key management approaches, including TEE, HSM, cloud HSM, Ledger, and encrypted software based key storage. Additionally, the remote signing server should be able to support multiple substrate based chains. This essentially acts as a single API for all key management and signing.

#### Approach

The signing server should run as a separate process on a physical on-premise host, although cloud based should be considered as well (although is less preferred).

An approach would be to have the remote server have an inverse connection where the remote signer makes an outbound encrypted connection channel to the validator host listening at the multiaddr URL specified by the substrate client cli flag `--keystore-server <URL>`. The remote signer would not be open to any outbound traffic, reducing it's attack surface. It's the signer's responsibility then to keep the connection open to the substrate client. After making an initial connection, the remote signing server listens for RPC requests from the validator host, handles them by creating the appropriate signature or payload, and sends the response back to the validator host.

#### RPC API Spec

Requests and responses from the substrate client to the signing server should be tagged appropriately to differentiate how and what to sign. These would be specific to the module that is requesting them, such as `GRANDPA` or `BABE`.

One could imagine the following types of RPC requests/responses:
- `GrandpaPrevoteRequest` / `GrandpaPrevoteResponse`
- `GrandpaPrecommitRequest` / `GrandpaPrecommitResponse`
- `BabeVRFRequest` / `BabeVRFResponse`
- `BabeAuthorRequest` / `BabeAuthorResponse`

The specifics of these should be a point of discussion as how to minimize the changes needed in the substrate client.

#### Configuration

Configuration of the signing server can be done via a config file that gets loaded upon starting the remote signing server. As one design goal is to have flexible ways of storing keys, this will be used for specifying the key provider (what is storing the keys), type of key, validator host, and so forth.

The following is a non-exhaustive list of some possible configuration parameters:

 - `validator`
   - This will specify a top level validator node. As validators will likely run multiple validator nodes, you would specify each of the following parameters _per validator_. 
    - `name`
      - A name to differentiate multiple validators
    - `chain`
      - This specifies which substrate based chain will be handled by the remote signing server.
      - ie `kusama`, `polkadot`, `dev`, `flaming-fir`, `parachain-id`, etc
    - `validator-multiaddr`
      - The multiaddr of the validator host which the remote signing server should initiate a connection with.
    - `validator-connection`
      - This would specify details around how the connection to the validator host should be handled, including secrets involved ininitiating a handshake for an encrypted connection.
    - `key`
      - The following specifies a unique key used by the particular validator. Each validator will have multiple keys, and each would define the type of key, what it's used for, and where it should be stored.
      - `purpose`
        - `babe`, `grandpa`, `authority-discovery`, etc
      - `key-provider`
        - The type of keystore medium, ie `YubiHSM2`, `TEE`, `Ledger`, `AWS CloudHSM`, etc
      - `key-type`
        - The type of key, such as `sr25519`, `ed25519`, `bls12-381`, etc


#### CLI

The remote signing server would likely have a cli interface for setup, debugging, and deployment. 

One could imagine the following possible commands:
- `generate`
    - This would generate new keys. It could take in flags such as:
      - `--val-name` the name of the validator for which the key belongs
      - `--purpose` with options `grandpa`, `babe`, `aura`, etc
      - `--key-type` with options `sr25519`, `ed25519`, `bls12-381`
      - `--key-provider` with options `soft`, `yubihsm`, `sgx`, `ledger`, or others
- `add`
  - This would add existing keys into the keystore. It could take in flags such as:
      - `--val-name` the name of the validator for which the key belongs
      - `--purpose` with options `grandpa`, `babe`, `aura`, etc
      -  `--key-type` with options `sr25519`, `ed25519`, `bls12-381`
      -  `--key-provider` with options `soft`, `yubihsm`, `sgx`, `ledger`, or others
- `rotate-keys`
  - This would rotate all or a subset of keys.
- `ping`
  - This would test signing with keys to ensure they work as expected. This would be useful for keys connected via hardware to ensure that operation is as expected.


### Key Providers

The following describes some key providers and some benefits and trade offs they may provide.

#### HSMs

HSMs, or hardware security modules, allow you to store keys in a secure manner within hardware. They use tamper proof secure elements that prevent key extraction and allow payloads to be signed without ever exposing the private keys to the host. Since the generated keys never leave the device, even if the validator host is compromised, an attacker would not be able to access these keys.

One issue with most HSMs, however is that they are dumb signing oracles. It will sign whatever it recieves without verifying it. Thus this alone doesn't provide much security compared to soft signing in terms of equivocation. If the validator host is compromised, an attacker can still request a signature, however they cannot extract the keys themselves. This approach is thus most useful with a remote signing server that also has double signing protection.


#### TEE

A remote signer operating within a TEE such as SGX or Trustzone gives increased security compared to filestore based storage. 

[Here's one approach as to how this can be used in this type of situation.](https://github.com/scs/substraTEE/blob/master/validator-protection/VALIDATOR_PROTECTION_PROPOSALS.md)

#### Ledger

Ledgers work very well amidst HSM-like solutions, as they are programmable (and thus double signing protection can be built into the software). They are also cheap, highly available, and easily accessible. In production datacenters, [these can work surprisingly well](https://medium.com/cryptium-cosmos/launching-cosmos-with-double-sign-protection-in-hardware-and-resilience-to-host-compromise-d572c75e7081). 

## Substrate Client

One would need to modify the Substrate client to account for fetching keys and signatures externally.

A first thing that needs to be done is implment an RPC server for sending and fetching requests. This would involve either creating a new module, `keystore-server`, or modifying the `keystore` module to include this.

The RPC server would start to run when additional cli flag is given to a substrate client, `--keystore-server <CONNECTION_SECRET>`. When this flag is given, the RPC server well begin to listen for a request from the remote signing server to initiate a handshake. `CONNECTION_SECRET` will be needed to start the handshake, and from an operators perspective, this should be handled with a secrets management service like [Hashcorp Vault](https://www.vaultproject.io/). Additionally, another flag, `--keystore-server-url <URL>` could be specified as a specific url or port that the RPC Server listens on.

If the subrate node is started with the `--keystore-server` flag enabled, it would wait until a handshake is made before it starts producing and finalizing blocks.

Additionally, changes would need to be made to the substrate client to change how keys are fetched and signatures created compared to how it exists currently. One approach here would be to modify the `keystore` in the client to contain abstractions over this happening in either the client or fetching them from the remote server. This would contain the interface that both the client signer (perhaps within the `keystore`) or external signer implements. Either a new `keystore-server` or modified existing `keystore` will have the responsibility of generating requests needed to send to the external signing server. Changes in the consensus modules will need to be made to delegate the creating of those requests to `keystore`/`keystore-server`.


## Double Signing Protection

Although adding a remote signer can add a layer of security compared to the current status quo, if the validator host were to be compromised, the attacker can still initiate a double sign by invoking the remote signing server. In order to mitigate this, double signing protection should eventually get built into the remote signing server. If the substrate client is compromised, the signing server should be able to prevent equivocation, or anything that ends in the corresponding extreme level of slashing for the validator.

In order to do this, the remote signing server would need to keep track of state as to not be able to produce or finalize conflicting blocks.

In Tezos, double signing protection is done by keeping track of a high watermark for endorsements and block headers. The high watermark is the highest level to have been baked so far and no block header or endorsement will be signed at a lower block level than the previous block or endorsement.

In Cosmos, this is done by keeping track of the last Height, Round, Step (HRS). When trying to sign a new block, it will only sign any that have a higher HRS.

Thus, the following will need to be constructed individually:
- BABE double signing protection
- GRANDPA double signing protection
- Parachain ID double signing protection

## High Availabilty

Having both remote signing as well as double signing protection can help give way to high availabilty (active/active) type setups that would increase the resiliancy of the network and validator operations. One possibility this unlocks is a MPC ha keystore server with _m_ of _n_ threshold based signatures required to produce the signature to the validator host. This depends on [#11](https://github.com/w3f/schnorrkel/issues/11), but ultimately creates an extremely robust setup where the cost and opportunity to compromise a validator becomes substantially lower than the current status quo. 

## v1
A first version of this would have minimal functionality at first, likely using session keys like they are now, but isolated within a remote signing server. HSM interfaces as well as double signing protection should be next steps.

# Discussion

- What should the API spec of signing look like?
  - How should GRANDPA, BABE, AuRa, im-online, etc requests/responses be structured?
  - How can this be modular enough to handle multiple types of key storage? (HSM, Ledger, TEE, soft signing)
  - How should keys be identified or bundled?
- What should the RPC server in the substrate client look like? Should it be it's own module, or should the existing `keystore` be modified for this logic?
- What are some approaches to prevent double signing for BABE, GRANDPA, AuRa, etc.?
- What should configuring the signing server look like?
- What should the cli interface of the signing server look like?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add remote signing to substrate client #4689

Context

Remote Signing Server

Approach

RPC API Spec

Configuration

CLI

Key Providers

HSMs

TEE

Ledger

Substrate Client

Double Signing Protection

High Availabilty

v1

Discussion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add remote signing to substrate client #4689

Description

Context

Remote Signing Server

Approach

RPC API Spec

Configuration

CLI

Key Providers

HSMs

TEE

Ledger

Substrate Client

Double Signing Protection

High Availabilty

v1

Discussion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions