Skip to content

Kubernetes: Trouble establishing HAย #2176

@milesgranger

Description

@milesgranger

Hi again, I'm working on fixing the Helm chart for persistence and HA but I seem to be missing something for HA setup. For replica count of 2, and following the High Availability docs, the servers seem to be having a hard time communicating with eachother.

At the end are are the logs from the headless Service, and as you will see, 'arcadedb-0' cannot reach 'arcadedb-1' and vice-versa. Since logs are showing up in the headless service, I believe the k8s mechanics are in order, but that they are rejecting the incoming requests for some reason.

Here is the StatefulSet and headless Service manifests post helm rendering. However, in the startup logs it seems to be convinced there is not a serverList and that it's not running on k8s. (but it announces it is later during startup) I've tried setting these flags as ARCADEDB_SETTINGS and as command line args... hoping I'm just doing something dumb. ๐Ÿ˜…

# Source: arcadedb/templates/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: arcadedb
  labels:
    app: arcadedb
    helm.sh/chart: arcadedb-0.1.0
    app.kubernetes.io/name: arcadedb
    app.kubernetes.io/instance: arcadedb
    app.kubernetes.io/version: "25.2.1"
    app.kubernetes.io/managed-by: Helm
spec:
  clusterIP: None
  ports:
    - port: 2480
      targetPort: http
      protocol: TCP
      name: http
    - port: 2424
      targetPort: rpc
      protocol: TCP
      name: rpc
  selector:
    app.kubernetes.io/name: arcadedb
    app.kubernetes.io/instance: arcadedb
---
# Source: arcadedb/templates/statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: arcadedb
  labels:
    app: arcadedb
    helm.sh/chart: arcadedb-0.1.0
    app.kubernetes.io/name: arcadedb
    app.kubernetes.io/instance: arcadedb
    app.kubernetes.io/version: "25.2.1"
    app.kubernetes.io/managed-by: Helm
spec:
  replicas: 2
  selector:
    matchLabels:
      app.kubernetes.io/name: arcadedb
      app.kubernetes.io/instance: arcadedb
  template:
    metadata:
      labels:
        app: arcadedb
        helm.sh/chart: arcadedb-0.1.0
        app.kubernetes.io/name: arcadedb
        app.kubernetes.io/instance: arcadedb
        app.kubernetes.io/version: "25.2.1"
        app.kubernetes.io/managed-by: Helm
    spec:
      serviceAccountName: arcadedb
      containers:
        - name: arcadedb
          securityContext:
            runAsUser: 0
          image: "arcadedata/arcadedb:25.2.1"
          imagePullPolicy: IfNotPresent
          ports:
            - name: http
              containerPort: 2480
              protocol: TCP
          command:
            - bin/server.sh
          livenessProbe:
            httpGet:
              path: /
              port: http
          readinessProbe:
            httpGet:
              path: /
              port: http
          volumeMounts:
            - name: datadir
              mountPath: /mnt/data0
          env:
            # xref: not setting via cmdline due to issue:
            # https://github.com/ArcadeData/arcadedb/issues/1614#issuecomment-2189446492
            - name: ARCADEDB_SETTINGS
              value: |
               -Darcadedb.dumpConfigAtStartup=true
               -Darcadedb.server.name=${HOSTNAME}
               -Darcadedb.server.rootPassword=${rootPassword}
               -Darcadedb.server.databaseDirectory=/mnt/data0/databases
               -Darcadedb.server.defaultDatabases=Universe[foo:bar]
               -Darcadedb.ha.enabled=true
               -Darcadedb.ha.replicationIncomingHost=0.0.0.0
               -Darcadedb.ha.serverList=arcadedb-0.arcadedb.arcadedb.svc.cluster.local:2424,arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424
               -Darcadedb.ha.k8s=true
               -Darcadedb.ha.k8sSuffix=.arcadedb.arcadedb.svc.cluster.local
               -Darcadedb.server.mode=development

            - name: POD_ID
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
            - name: rootPassword
              valueFrom:
                secretKeyRef:
                  name: arcadedb-root-password-secret
                  key: rootPassword
                  optional: false
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - arcadedb
              topologyKey: kubernetes.io/hostname
            weight: 100
  persistentVolumeClaimRetentionPolicy:
    whenDeleted: Delete
    whenScaled: Retain
  volumeClaimTemplates:
    - metadata:
        name: datadir
      spec:
        accessModes:
          - "ReadWriteOnce"
        storageClassName: longhorn
        resources:
          requests:
            storage: "4Gi"

Startup logs from arcadedb-0

Odd that the serverList is empy and it doesn't think it's on k8s, even though later it announces it is running inside k8s? But it knows to start contacting arcadedb-1, and it's set clearly in the config above...so maybe these are the initial values that are updated later?

  + arcadedb.bucketDefaultPageSize = 65536
  + arcadedb.bucketWipeOutOnDelete = true
  + arcadedb.command.timeout = 0
  + arcadedb.command.warningsEvery = 100
  + arcadedb.commitLockTimeout = 5000
  + arcadedb.cypher.statementCache = 1000
  + arcadedb.dateFormat = yyyy-MM-dd
  + arcadedb.dateImplementation = class java.util.Date
  + arcadedb.dateTimeFormat = yyyy-MM-dd HH:mm:ss
  + arcadedb.dateTimeImplementation = class java.util.Date
  + arcadedb.dumpConfigAtStartup = true
  + arcadedb.dumpMetricsEvery = 0
  + arcadedb.freePageRAM = 50
  + arcadedb.gremlin.timeout = 30000
  + arcadedb.gremlin.engine = auto
  + arcadedb.ha.clusterName = arcadedb
  + arcadedb.ha.enabled = false
  + arcadedb.ha.k8s = false
  + arcadedb.ha.k8sSuffix =
  + arcadedb.ha.quorum = majority
  + arcadedb.ha.quorumTimeout = 10000
  + arcadedb.ha.replicationChunkMaxSize = 16777216
  + arcadedb.ha.replicationFileMaxSize = 1073741824
  + arcadedb.ha.replicationIncomingHost = 0.0.0.0
  + arcadedb.ha.replicationIncomingPorts = 2424-2433
  + arcadedb.ha.replicationQueueSize = 512
  + arcadedb.ha.serverList =
  + arcadedb.ha.serverRole = any
  + arcadedb.indexCompactionMinPagesSchedule = 10
  + arcadedb.indexCompactionRAM = 300
  + arcadedb.initialPageCacheSize = 65535
  + arcadedb.maxPageRAM = 4096
  + arcadedb.mongo.host = 0.0.0.0
  + arcadedb.mongo.port = 27017
  + arcadedb.network.socketTimeout = 30000
  + arcadedb.ssl.keyStore = null
  + arcadedb.ssl.keyStorePassword = null
  + arcadedb.ssl.trustStore = null
  + arcadedb.ssl.trustStorePassword = null
  + arcadedb.ssl.enabled = false
  + arcadedb.pageFlushQueue = 512
  + arcadedb.polyglotCommand.timeout = 10000
  + arcadedb.postgres.debug = false
  + arcadedb.postgres.host = 0.0.0.0
  + arcadedb.postgres.port = 5432
  + arcadedb.profile = default
  + arcadedb.queryMaxHeapElementsAllowedPerOp = 500000
  + arcadedb.redis.host = 0.0.0.0
  + arcadedb.redis.port = 6379
  + arcadedb.server.backupDirectory = ${arcadedb.server.rootPath}/backups
  + arcadedb.server.databaseDirectory = ${arcadedb.server.rootPath}/databases
  + arcadedb.server.databaseLoadAtStartup = true
  + arcadedb.server.defaultDatabases =
  + arcadedb.server.defaultDatabaseMode = READ_WRITE
  + arcadedb.server.httpsIncomingPort = 2490-2499
  + arcadedb.server.httpIncomingHost = 0.0.0.0
  + arcadedb.server.httpIncomingPort = 2480-2489
  + arcadedb.server.httpsIoThreads = 0
  + arcadedb.server.httpSessionExpireTimeout = 1800
  + arcadedb.serverMetrics = true
  + arcadedb.serverMetrics.logging = false
  + arcadedb.server.mode = development
  + arcadedb.server.name = ArcadeDB_0
  + arcadedb.server.plugins =
  + arcadedb.server.rootPassword = null
  + arcadedb.server.rootPasswordPath = null
  + arcadedb.server.rootPath = null
  + arcadedb.server.securityAlgorithm = PBKDF2WithHmacSHA256
  + arcadedb.server.reloadEvery = 5000
  + arcadedb.server.securitySaltCacheSize = 64
  + arcadedb.server.saltIterations = 65536
  + arcadedb.server.eventBusQueueSize = 1000
  + arcadedb.sqlStatementCache = 300
  + arcadedb.test = false
  + arcadedb.txRetries = 3
  + arcadedb.txRetryDelay = 100
  + arcadedb.txWAL = true
  + arcadedb.txWalFlush = 0
  + arcadedb.typeDefaultBuckets = 1


2025-04-24 11:36:43.822 INFO  [ArcadeDBServer] Server is running inside Kubernetes. Hostname: arcadedb-0.arcadedb.arcadedb.svc.cluster.local
25-04-24 11:36:43.826 INFO  [ArcadeDBServer] <arcadedb-0> ArcadeDB Server v25.2.1 (build 8896e2c572b6e5c32ce069a5517cc9688b0469a2/1740689842482/main) is starting up...
2025-04-24 11:36:43.844 INFO  [ArcadeDBServer] <arcadedb-0> Running on Linux 6.8.0-58-generic - OpenJDK 64-Bit Server VM 17.0.14 (Temurin-17.0.14+7)
2025-04-24 11:36:43.849 INFO  [ArcadeDBServer] <arcadedb-0> Starting ArcadeDB Server in production mode with plugins [] ...
2025-04-24 11:36:43.936 INFO  [ArcadeDBServer] <arcadedb-0> - Metrics Collection Started...
2025-04-24 11:36:44.628 INFO  [ServerSecurity] <arcadedb-0> Creating root user with the provided password
2025-04-24 11:36:45.408 INFO  [HttpServer] <arcadedb-0> - Starting HTTP Server (host=0.0.0.0 port=2480-2489 httpsPort=2490-2499)...
2025-04-24 11:36:45.504 INFO  [undertow] starting server: Undertow - 2.3.18.Final
2025-04-24 11:36:45.513 INFO  [xnio] XNIO version 3.8.16.Final
2025-04-24 11:36:45.522 INFO  [nio] XNIO NIO Implementation Version 3.8.16.Final
2025-04-24 11:36:45.586 INFO  [threads] JBoss Threads version 3.5.0.Final
2025-04-24 11:36:45.651 INFO  [HttpServer] <arcadedb-0> - HTTP Server started (host=0.0.0.0 port=2480 httpsPort=2490)
2025-04-24 11:36:45.668 INFO  [LeaderNetworkListener] <arcadedb-0> Listening for replication connections on 0.0.0.0:2424 (protocol v.-1)
2025-04-24 11:36:45.686 INFO  [HAServer] <arcadedb-0> Error connecting to the remote Leader server arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424 (error=com.arcadedb.network.binary.ConnectionException: Error on connecting to server 'arca
2025-04-24 11:36:45.687 INFO  [HAServer] <arcadedb-0> Unable to find any Leader, start election (cluster=arcadedb configuredServers=2 majorityOfVotes=2)
2025-04-24 11:36:45.690 INFO  [HAServer] Change election status from DONE to VOTING_FOR_ME
2025-04-24 11:36:45.690 INFO  [HAServer] Starting election of local server asking for votes from [arcadedb-0.arcadedb.arcadedb.svc.cluster.local:2424, arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424] (turn=1 retry=0 lastReplicationMessage
2025-04-24 11:36:45.691 INFO  [HAServer] Error contacting server arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424 for election: arcadedb-1.arcadedb.arcadedb.svc.cluster.local
2025-04-24 11:36:45.691 INFO  [HAServer] Not able to be elected as Leader, waiting 1487ms and retry (turn=1 totalVotes=1 majority=2)
2025-04-24 11:36:45.866 INFO  [ArcadeDBServer] <arcadedb-0> Available query languages: [sqlscript, mongo, gremlin, java, cypher, js, graphql, sql]
2025-04-24 11:36:45.868 INFO  [ArcadeDBServer] <arcadedb-0> ArcadeDB Server started in 'production' mode (CPUs=2 MAXRAM=2.00GB)
2025-04-24 11:36:47.179 INFO  [HAServer] Starting election of local server asking for votes from [arcadedb-0.arcadedb.arcadedb.svc.cluster.local:2424, arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424] (turn=2 retry=1 lastReplicationMessage
2025-04-24 11:36:47.182 INFO  [HAServer] Error contacting server arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424 for election: arcadedb-1.arcadedb.arcadedb.svc.cluster.local
2025-04-24 11:36:47.182 INFO  [HAServer] Not able to be elected as Leader, waiting 1741ms and retry (turn=2 totalVotes=1 majority=2)

arcadedb-0 2025-04-24 11:26:28.944 INFO [HAServer] Starting election of local server asking for votes from [arcadedb-0.arcadedb.arcadedb.svc.cluster.local:2424, arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424] (turn=171 retry=170 lastRep
arcadedb-0 2025-04-24 11:26:28.945 INFO [HAServer] Error contacting server arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424 for election: arcadedb-1.arcadedb.arcadedb.svc.cluster.local
arcadedb-1 2025-04-24 11:26:28.217 INFO [HAServer] Not able to be elected as Leader, waiting 1626ms and retry (turn=152 totalVotes=1 majority=2)
arcadedb-1 2025-04-24 11:26:29.844 INFO [HAServer] Starting election of local server asking for votes from [arcadedb-0.arcadedb.arcadedb.svc.cluster.local:2424, arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424] (turn=153 retry=152 lastRep
arcadedb-1 2025-04-24 11:26:29.845 INFO [HAServer] Error contacting server arcadedb-0.arcadedb.arcadedb.svc.cluster.local:2424 for election: arcadedb-0.arcadedb.arcadedb.svc.cluster.local
arcadedb-0 2025-04-24 11:26:28.945 INFO [HAServer] Not able to be elected as Leader, waiting 1774ms and retry (turn=171 totalVotes=1 majority=2)
arcadedb-0 2025-04-24 11:26:30.719 INFO [HAServer] Starting election of local server asking for votes from [arcadedb-0.arcadedb.arcadedb.svc.cluster.local:2424, arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424] (turn=172 retry=171 lastRep
arcadedb-0 2025-04-24 11:26:30.722 INFO [HAServer] Error contacting server arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424 for election: arcadedb-1.arcadedb.arcadedb.svc.cluster.local
arcadedb-1 2025-04-24 11:26:29.845 INFO [HAServer] Not able to be elected as Leader, waiting 1767ms and retry (turn=153 totalVotes=1 majority=2)
arcadedb-1 2025-04-24 11:26:31.612 INFO [HAServer] Starting election of local server asking for votes from [arcadedb-0.arcadedb.arcadedb.svc.cluster.local:2424, arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424] (turn=154 retry=153 lastRep
arcadedb-1 2025-04-24 11:26:31.624 INFO [HAServer] Error contacting server arcadedb-0.arcadedb.arcadedb.svc.cluster.local:2424 for election: arcadedb-0.arcadedb.arcadedb.svc.cluster.local
arcadedb-0 2025-04-24 11:26:30.722 INFO [HAServer] Not able to be elected as Leader, waiting 1483ms and retry (turn=172 totalVotes=1 majority=2)
arcadedb-0 2025-04-24 11:26:32.205 INFO [HAServer] Starting election of local server asking for votes from [arcadedb-0.arcadedb.arcadedb.svc.cluster.local:2424, arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424] (turn=173 retry=172 lastRep
arcadedb-0 2025-04-24 11:26:32.207 INFO [HAServer] Error contacting server arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424 for election: arcadedb-1.arcadedb.arcadedb.svc.cluster.local
arcadedb-0 2025-04-24 11:26:32.207 INFO [HAServer] Not able to be elected as Leader, waiting 1214ms and retry (turn=173 totalVotes=1 majority=2)
arcadedb-0 2025-04-24 11:26:33.422 INFO [HAServer] Starting election of local server asking for votes from [arcadedb-0.arcadedb.arcadedb.svc.cluster.local:2424, arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424] (turn=174 retry=173 lastRep
arcadedb-0 2025-04-24 11:26:33.422 INFO [HAServer] Error contacting server arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424 for election: arcadedb-1.arcadedb.arcadedb.svc.cluster.local
arcadedb-1 2025-04-24 11:26:31.625 INFO [HAServer] Not able to be elected as Leader, waiting 1797ms and retry (turn=154 totalVotes=1 majority=2)
arcadedb-1 2025-04-24 11:26:33.422 INFO [HAServer] Starting election of local server asking for votes from [arcadedb-0.arcadedb.arcadedb.svc.cluster.local:2424, arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424] (turn=155 retry=154 lastRep
arcadedb-1 2025-04-24 11:26:33.422 INFO [HAServer] Error contacting server arcadedb-0.arcadedb.arcadedb.svc.cluster.local:2424 for election: arcadedb-0.arcadedb.arcadedb.svc.cluster.local
arcadedb-0 2025-04-24 11:26:33.422 INFO [HAServer] Not able to be elected as Leader, waiting 1641ms and retry (turn=174 totalVotes=1 majority=2)
arcadedb-0 2025-04-24 11:26:35.064 INFO [HAServer] Starting election of local server asking for votes from [arcadedb-0.arcadedb.arcadedb.svc.cluster.local:2424, arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424] (turn=175 retry=174 lastRep
arcadedb-0 2025-04-24 11:26:35.075 INFO [HAServer] Error contacting server arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424 for election: arcadedb-1.arcadedb.arcadedb.svc.cluster.local
arcadedb-1 2025-04-24 11:26:33.423 INFO [HAServer] Not able to be elected as Leader, waiting 1979ms and retry (turn=155 totalVotes=1 majority=2)
arcadedb-1 2025-04-24 11:26:35.402 INFO [HAServer] Starting election of local server asking for votes from [arcadedb-0.arcadedb.arcadedb.svc.cluster.local:2424, arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424] (turn=156 retry=155 lastRep
arcadedb-1 2025-04-24 11:26:35.402 INFO [HAServer] Error contacting server arcadedb-0.arcadedb.arcadedb.svc.cluster.local:2424 for election: arcadedb-0.arcadedb.arcadedb.svc.cluster.local

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions