-
Notifications
You must be signed in to change notification settings - Fork 4.1k
server: enable secure clusters with authenticated gRPC, but without TLS #54007
Description
Epic: CRDB-12037
Discussed in #16188 (comment), #53404 and #53991: certain users choose to secure their communication at the network level (e.g. DB + apps on a private encrypted networks), and for these cases mandating TLS on top is unnecessary and cumbersome.
For these users, it is desirable to enable secure clusters (With all authentication and authorization mechanisms in place) without mandating TLS.
Background
Today TLS is mandatory in the following cases:
- for node-node RPC connections
- for node-client RPC conns (for CLI admin commands)
- for SQL connections
- for HTTP connections (if
--unencrypted-localhost-httpis not given)
Some progress is made by #44842 / #53991 to non-TLS SQL clients with --allow-sql-without-tls. We can likely introduce a similar flag for the HTTP endpoint.
However, the RPC connections are a bit more tricky: the gRPC protocol does not offer a standard authentication handshake; today CockroachDB relies on TLS to authenticate peers.
Why does this matter? for context, TLS offers 3 separate protections:
- confidentiality (encryption)
- resistance to MITM observers
- tamper-resistance (hashes)
- resistance to MITM attackers altering the data in-transit
- authentication (key handshake)
- resistance to spoofing
- resistance to escalation of privileges
- protection against operational mistakes
With network-level security, the network security takes over confidentiality, tamper-resistance and authn for all the malicious attack scenarios. However, protection against operational mistakes is still relevant and thus some form of authentication remains useful.
Guide-level explanation
With this change in place, a cluster could start securely without TLS certificates configured.
When used without TLS, the following mechanisms are still used for authentication:
- for the admin UI, password login
- for SQL, the HBA authentication rules (including asking for passwords by default)
- for node-node connections, TBD (presumably
--cluster-name)
Once a cluster has started securely without TLS, it would be possible to upgrade it into using TLS gracefully:
- generate node certs
- copy the node certs and CA public key to the node certs directory
- restart the cluster node by node while accepting either TLS or non-TLS node-node conns
- restart the cluster node by node a second time to accept only TLS node-node conns
(This double restart mechanism is similar to the one required to introduce --cluster-name.)
Implementation details
To make this happens require two separate changes:
-
design and implement a gRPC authn mechanism
For this, we'd likely introduce a HTTP header. We could have this present only the identity of the principal (i.e. make the server "trust" all incoming connections), or require a shared secret.
(Maybe adding a shared secret is unecessary as
--cluster-namecan achieve this already) -
change the RPC connection code to accept mixed TLS / non-TLs clusters when a flag is enabled.
This is necessary to upgrade a cluster to use TLS while it is running. See the guide level explanation above.
In order to avoid adding this connection mode also to the CLI client commands (to reduce complexity), we can choose to first address #51454.
Epic: CRDB-549
Jira issue: CRDB-3811