Skip to content

Raft gRPC: add support for mTLS for encryption and peer identity #3890

@lvca

Description

@lvca

Context

The Ratis gRPC transport used for Raft log replication, leader election, and snapshot chunk transfer currently has no peer authentication and no in-transit encryption. Today the only inter-node protections are:

  • X-ArcadeDB-Cluster-Token header on the HTTP side channels (snapshot download, database verify)
  • Network-level isolation (K8s NetworkPolicy, private subnet, VPN) called out as the recommended hardening step in RaftHAServer.startService() warnings

The Raft gRPC port remains an unauthenticated back door: any host that can reach the port can open a gRPC stream to the Ratis server and inject log entries.

A follow-up change will add a ServerTransportFilter-based peer-address allowlist to close the "random host on the network knows the port" case out of the box. That mitigation is IP-based and is defeated by IP spoofing on a flat L2 network or by a compromised peer. Production deployments need cryptographic peer identity and in-transit encryption.

Goal

Add optional mTLS to the Raft gRPC transport using the GrpcTlsConfig mechanism already supported by Apache Ratis (GrpcConfigKeys.TLS.setConf, GrpcConfigKeys.Admin.setTlsConf, GrpcConfigKeys.Client.setTlsConf, GrpcConfigKeys.Server.setTlsConf).

When enabled:

  • Every Raft gRPC connection between nodes negotiates TLS with mutual client-certificate authentication.
  • Peer identity is bound to the certificate (CN/SAN), verified against a shared cluster CA.
  • All AppendEntries / InstallSnapshot / RequestVote traffic is encrypted in transit.
  • Unauthenticated or non-CA-signed peers are rejected at the TLS handshake.

Proposed configuration surface

New settings under arcadedb.ha.tls.*:

  • arcadedb.ha.tls.enabled - boolean, default false (preserve zero-config dev/test)
  • arcadedb.ha.tls.certChainFile - PEM file with this node's certificate (and intermediates if any)
  • arcadedb.ha.tls.privateKeyFile - PEM file with this node's private key
  • arcadedb.ha.tls.trustCertCollectionFile - PEM file with the cluster CA certificate(s)
  • arcadedb.ha.tls.mutualAuth - boolean, default true

Wiring in RaftPropertiesBuilder / RaftHAServer:

  • When arcadedb.ha.tls.enabled=true, build a GrpcTlsConfig from the configured files and install it via GrpcConfigKeys.TLS.setConf(parameters, tlsConf) on the Parameters object passed to RaftServer.Builder.setParameters(...).
  • Apply the same GrpcTlsConfig on the RaftClient.Builder.setParameters(...) used by RaftHAServer.buildRaftClient() so the leader's self-client also speaks TLS.
  • Fail-fast on startup if TLS is enabled but any of the cert/key/trust paths are missing or unreadable, with a clear error message.

Documentation deliverables

  • docs/arcadedb-ha-*.md section on production hardening, including:
    • openssl-based recipe for a self-signed cluster CA and per-node cert (already drafted in internal notes)
    • Cert-manager recipe for Kubernetes StatefulSet deployments (SAN must match the pod's stable DNS name)
    • Vault PKI / internal-PKI guidance for orgs that already run a CA
    • Operational notes on rotation and expiry
  • Release-notes callout: mTLS is the supported way to secure the Raft gRPC port; the peer-address allowlist is a best-effort default, not a substitute.

Test plan

  • Unit test: RaftPropertiesBuilder builds a correct GrpcTlsConfig from the new settings and attaches it to Parameters.
  • Integration test: 3-node BaseRaftHATest subclass that boots with mTLS enabled using a test CA under src/test/resources; verifies a full leader election + transaction commit + follower catch-up round-trip.
  • Negative integration test: a 4th node with a cert signed by a different CA attempts to join; handshake is rejected and no log entries are accepted from it.
  • Snapshot HTTP handler: confirm the existing X-ArcadeDB-Cluster-Token check on SnapshotHttpHandler is preserved (TLS on gRPC does not replace the HTTP token).

Out of scope for this issue

  • Cert rotation without restart (can be a follow-up once the static-config path lands)
  • Integrating with an external KMS or Vault agent for cert retrieval (leave it to ops; document the pattern)

Metadata

Metadata

Assignees

Labels

No labels
No labels
No fields configured for Feature.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions