Skip to content

fix(helm): production-ready Helm chart aligned with ha-raft subsystem #4034

@robfrank

Description

@robfrank

Problem

The k8s/helm/ chart has several critical bugs that prevent HA from forming and expose insecure defaults. It also has not been updated to reflect the new ha-raft Apache Ratis subsystem.


Critical Bugs

1. Environment variable expansion broken (${HOSTNAME} not resolved)

statefulset.yaml passes JVM arguments using shell syntax (${HOSTNAME}, ${rootPassword}) inside a Kubernetes command: array. Kubernetes exec-form does not perform shell substitution, so the JVM receives the literal string ${HOSTNAME} as the server name. This breaks:

  • HA peer identity resolution (RaftPeerAddressResolver.findLocalPeerId) - all nodes appear as ${HOSTNAME}
  • Authentication - the password is set to the literal string ${rootPassword}

Expected: Use Kubernetes-native $(VAR_NAME) substitution and declare HOSTNAME explicitly via downward API (fieldRef: metadata.name).

2. Wrong Raft port (2424 instead of 2434)

The old binary-protocol HA used port 2424. The new ha-raft module (Apache Ratis gRPC) uses port 2434 (GlobalConfiguration.HA_RAFT_PORT). The chart defaults to 2424, so peers can never connect.

3. Invalid HA property -Darcadedb.ha.replicationIncomingHost

This property does not exist in the ha-raft subsystem. It is a leftover from the old binary-protocol HA and causes a startup warning/error.

4. HA unconditionally enabled for single-node deployments

HA JVM arguments are always emitted, even for replicaCount: 1 (dev/test deployments). HA should only be enabled when replicaCount > 1 or autoscaling.enabled: true.

5. Bootstrap deadlock - headless service missing publishNotReadyAddresses: true

Without this field, Kubernetes only publishes DNS entries for pods that have passed their readiness probe. During HA bootstrap, pods need to resolve each other before any of them is ready - causing a deadlock where no node can start because it cannot discover its peers.

6. Ingress backend points at headless service

ingress.yaml references the headless service as the ingress backend. Ingress controllers cannot route to headless services. The port key also references a non-existent $.Values.service.port instead of $.Values.service.http.port.


Additional Issues

HPA / KubernetesAutoJoin not supported

When autoscaling.enabled: true, the arcadedb.nodenames helper sizes the server list to replicaCount rather than autoscaling.maxReplicas. This means KubernetesAutoJoin cannot resolve pod ordinals beyond the initial replica count, and scale-up fails.

There is also no Raft quorum guard: HPA could scale minReplicas below floor(maxReplicas/2) + 1, causing the cluster to lose availability.

Insecure and incorrect defaults

Setting Current (wrong) Expected
arcadedb.defaultDatabases Universe[foo:bar] "" - no hardcoded credentials
arcadedb.extraCommands development mode production mode
podSecurityContext {} runAsNonRoot: true, fsGroup: 1000
securityContext {} runAsUser/Group: 1000, allowPrivilegeEscalation: false, capabilities.drop: [ALL]
serviceAccount.automount true false (ArcadeDB does not call the K8s API)
service.http.type LoadBalancer ClusterIP
service.rpc.port 2424 2434

No persistence by default

There is no persistence section. The chart does not create a PersistentVolumeClaim for the database directory, so data is lost on pod restart.

No NetworkPolicy

The Raft gRPC port (2434) is exposed to any workload in the cluster. RaftHAServer.java includes a security note recommending that operators restrict access to this port via NetworkPolicy rules - the chart should provide an opt-in policy.

Stale README

README.md documents the old 2424 port, wrong service type default (LoadBalancer), wrong defaultDatabases example, and is missing the persistence, networkPolicy, podSecurityContext, and securityContext sections entirely.


Affected Files

k8s/helm/Chart.yaml, k8s/helm/values.yaml, k8s/helm/templates/statefulset.yaml, k8s/helm/templates/_helpers.tpl, k8s/helm/templates/service.yaml, k8s/helm/templates/ingress.yaml, k8s/helm/templates/hpa.yaml, k8s/helm/templates/NOTES.txt, k8s/helm/templates/extra-manifests.yaml, k8s/helm/README.md

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions