Skip to content

server: enable secure clusters with authenticated gRPC, but without TLS #54007

@knz

Description

@knz

Epic: CRDB-12037

Discussed in #16188 (comment), #53404 and #53991: certain users choose to secure their communication at the network level (e.g. DB + apps on a private encrypted networks), and for these cases mandating TLS on top is unnecessary and cumbersome.

For these users, it is desirable to enable secure clusters (With all authentication and authorization mechanisms in place) without mandating TLS.

Background

Today TLS is mandatory in the following cases:

  • for node-node RPC connections
  • for node-client RPC conns (for CLI admin commands)
  • for SQL connections
  • for HTTP connections (if --unencrypted-localhost-http is not given)

Some progress is made by #44842 / #53991 to non-TLS SQL clients with --allow-sql-without-tls. We can likely introduce a similar flag for the HTTP endpoint.

However, the RPC connections are a bit more tricky: the gRPC protocol does not offer a standard authentication handshake; today CockroachDB relies on TLS to authenticate peers.

Why does this matter? for context, TLS offers 3 separate protections:

  • confidentiality (encryption)
    • resistance to MITM observers
  • tamper-resistance (hashes)
    • resistance to MITM attackers altering the data in-transit
  • authentication (key handshake)
    • resistance to spoofing
    • resistance to escalation of privileges
    • protection against operational mistakes

With network-level security, the network security takes over confidentiality, tamper-resistance and authn for all the malicious attack scenarios. However, protection against operational mistakes is still relevant and thus some form of authentication remains useful.

Guide-level explanation

With this change in place, a cluster could start securely without TLS certificates configured.

When used without TLS, the following mechanisms are still used for authentication:

  • for the admin UI, password login
  • for SQL, the HBA authentication rules (including asking for passwords by default)
  • for node-node connections, TBD (presumably --cluster-name)

Once a cluster has started securely without TLS, it would be possible to upgrade it into using TLS gracefully:

  1. generate node certs
  2. copy the node certs and CA public key to the node certs directory
  3. restart the cluster node by node while accepting either TLS or non-TLS node-node conns
  4. restart the cluster node by node a second time to accept only TLS node-node conns

(This double restart mechanism is similar to the one required to introduce --cluster-name.)

Implementation details

To make this happens require two separate changes:

  • design and implement a gRPC authn mechanism

    For this, we'd likely introduce a HTTP header. We could have this present only the identity of the principal (i.e. make the server "trust" all incoming connections), or require a shared secret.

    (Maybe adding a shared secret is unecessary as --cluster-name can achieve this already)

  • change the RPC connection code to accept mixed TLS / non-TLs clusters when a flag is enabled.

    This is necessary to upgrade a cluster to use TLS while it is running. See the guide level explanation above.

In order to avoid adding this connection mode also to the CLI client commands (to reduce complexity), we can choose to first address #51454.

Epic: CRDB-549

Jira issue: CRDB-3811

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-authenticationPertains to authn subsystemsA-kv-serverRelating to the KV-level RPC serverA-securityA-server-networkingPertains to network addressing,routing,initializationC-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)T-server-and-securityDB Server & Security

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions