-
Notifications
You must be signed in to change notification settings - Fork 4.1k
server: integrate the TLS auto-negotiation inside the crdb code #60632
Copy link
Copy link
Closed
Labels
A-authenticationPertains to authn subsystemsPertains to authn subsystemsA-securityA-server-start-drainPertains to server startup and shutdown sequencesPertains to server startup and shutdown sequencesC-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)T-server-and-securityDB Server & SecurityDB Server & Security
Description
Fixes #32448.
Epic: CRDB-6663
Jira issue: CRDB-3167
Meta-issue to track the implementation for #51991
A couple of new components:
- existing CM manager remains unchanged, not involved in the discussion here
- new cert generation primitives in
securitypackage server: added initial cert utilities for automatic cert generation #60705 - initial cert handshake HTTP server
- initial cert handshake HTTP client
- cert creation loop with check of which CAs/certs are already provided by the operator server: added initial cert utilities for automatic cert generation #60705
-
(maybe in later phase) extend the cert rotation code with automatic handshake of the new batch of certs - join token generation and delivering a cert bundle for new nodes being added server, security: Token-based add/join TLS functionality #63492
- security: Add version number to base64 form of join tokens security: Add version number to base64 form of join tokens #64885
- end-to-end TCL unit test for the node join functionality
- Have connect commands use a configurable timeout cli: Have
connectcommands use a configurable timeout #64878 - mechanism to trigger the rotation automatically
- host cert + init token check during
startto decide whether to engageconnector start as usual server: integrate the TLS auto-negotiation in thestartcommands #63850 - cli/connect: make join warn/error the user if the target directory already contains certs cli/connect: make
joinwarn/error the user if the target directory already contains certs #64944
Follow-up work:
- ping the folk on roachprod: Clusters should default to secure #38539 to tell them to use the new code in roachprod
- server: clarify / improve the TLS join behavior for already-running, not-yet-initialized clusters server: clarify / improve the TLS join behavior for already-running, not-yet-initialized clusters #64934
Clean-up work:
- security: Reduce code duplication between auto_tls_init and the rest of the security package security: Reduce code duplication between auto_tls_init and the rest of the security package #64883
- security: Have TLS auto-join/init code use CertificateManager security: Have TLS auto-join/init code use CertificateManager #64884
CLI commands:
-
cockroach connect- new, only responsible for TLS handshake and writing the certs to disk
This will leverage the first 3 components identified above: cert gen primitives + HTTP client/server for handshake. -
cockroach start- when provided an init token, must check if the host cert is known already and if not start the TLS handshake before the remainder of the start code server: integrate the TLS auto-negotiation in thestartcommands #63850 -
cockroach start-single-node- new flag--self-secure-initthat auto-generates an init token and proceeds as per thestartlogic -
cockroach demo- will be modified to leverage the self secure init code added to start-single-node - Add
--init-token-fileflag to protect the init handshake shared secret cli:--init-tokenexposes the init token to thepscommand #61231
Bugs:
- Nodes should populate handshake messages with advertise address instead of listen server:
cockroach connectdoes not negotiate peers across NAT properly #61238 -
node.keypermissions are incorrect with a default umask - cockroach connect join gets confused by CA key file format
cockroach connect joingets confused by CA key file format #64942 security: accept CA keys in either PKCS#1 or PKCS#8 encodings #64943 - flaky test (stress) server: TestInitHandshakeWrongToken failed #61538
Technical question where the answer is needed as prereq to a number of points above:
- how are the CN and OU fields populated
- how is the SAN field populated
- current assumption for prototype/MVP: the addresses provided on
--joingo into the SAN? or maybe--listen-addr? (Unsure, this is under-specified) - need a practical test with a multi-server experiment, to understand the design constraint
- there may be some flags / extra logic needed to pick up reasonable + valid addresses to populate SAN
- current assumption for prototype/MVP: the addresses provided on
Possible action item: perform that experiment
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
A-authenticationPertains to authn subsystemsPertains to authn subsystemsA-securityA-server-start-drainPertains to server startup and shutdown sequencesPertains to server startup and shutdown sequencesC-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)T-server-and-securityDB Server & SecurityDB Server & Security