-
-
Notifications
You must be signed in to change notification settings - Fork 94
Description
Hi again, I'm working on fixing the Helm chart for persistence and HA but I seem to be missing something for HA setup. For replica count of 2, and following the High Availability docs, the servers seem to be having a hard time communicating with eachother.
At the end are are the logs from the headless Service, and as you will see, 'arcadedb-0' cannot reach 'arcadedb-1' and vice-versa. Since logs are showing up in the headless service, I believe the k8s mechanics are in order, but that they are rejecting the incoming requests for some reason.
Here is the StatefulSet and headless Service manifests post helm rendering. However, in the startup logs it seems to be convinced there is not a serverList and that it's not running on k8s. (but it announces it is later during startup) I've tried setting these flags as ARCADEDB_SETTINGS and as command line args... hoping I'm just doing something dumb. ๐
# Source: arcadedb/templates/service.yaml
apiVersion: v1
kind: Service
metadata:
name: arcadedb
labels:
app: arcadedb
helm.sh/chart: arcadedb-0.1.0
app.kubernetes.io/name: arcadedb
app.kubernetes.io/instance: arcadedb
app.kubernetes.io/version: "25.2.1"
app.kubernetes.io/managed-by: Helm
spec:
clusterIP: None
ports:
- port: 2480
targetPort: http
protocol: TCP
name: http
- port: 2424
targetPort: rpc
protocol: TCP
name: rpc
selector:
app.kubernetes.io/name: arcadedb
app.kubernetes.io/instance: arcadedb
---
# Source: arcadedb/templates/statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: arcadedb
labels:
app: arcadedb
helm.sh/chart: arcadedb-0.1.0
app.kubernetes.io/name: arcadedb
app.kubernetes.io/instance: arcadedb
app.kubernetes.io/version: "25.2.1"
app.kubernetes.io/managed-by: Helm
spec:
replicas: 2
selector:
matchLabels:
app.kubernetes.io/name: arcadedb
app.kubernetes.io/instance: arcadedb
template:
metadata:
labels:
app: arcadedb
helm.sh/chart: arcadedb-0.1.0
app.kubernetes.io/name: arcadedb
app.kubernetes.io/instance: arcadedb
app.kubernetes.io/version: "25.2.1"
app.kubernetes.io/managed-by: Helm
spec:
serviceAccountName: arcadedb
containers:
- name: arcadedb
securityContext:
runAsUser: 0
image: "arcadedata/arcadedb:25.2.1"
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 2480
protocol: TCP
command:
- bin/server.sh
livenessProbe:
httpGet:
path: /
port: http
readinessProbe:
httpGet:
path: /
port: http
volumeMounts:
- name: datadir
mountPath: /mnt/data0
env:
# xref: not setting via cmdline due to issue:
# https://github.com/ArcadeData/arcadedb/issues/1614#issuecomment-2189446492
- name: ARCADEDB_SETTINGS
value: |
-Darcadedb.dumpConfigAtStartup=true
-Darcadedb.server.name=${HOSTNAME}
-Darcadedb.server.rootPassword=${rootPassword}
-Darcadedb.server.databaseDirectory=/mnt/data0/databases
-Darcadedb.server.defaultDatabases=Universe[foo:bar]
-Darcadedb.ha.enabled=true
-Darcadedb.ha.replicationIncomingHost=0.0.0.0
-Darcadedb.ha.serverList=arcadedb-0.arcadedb.arcadedb.svc.cluster.local:2424,arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424
-Darcadedb.ha.k8s=true
-Darcadedb.ha.k8sSuffix=.arcadedb.arcadedb.svc.cluster.local
-Darcadedb.server.mode=development
- name: POD_ID
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: rootPassword
valueFrom:
secretKeyRef:
name: arcadedb-root-password-secret
key: rootPassword
optional: false
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- arcadedb
topologyKey: kubernetes.io/hostname
weight: 100
persistentVolumeClaimRetentionPolicy:
whenDeleted: Delete
whenScaled: Retain
volumeClaimTemplates:
- metadata:
name: datadir
spec:
accessModes:
- "ReadWriteOnce"
storageClassName: longhorn
resources:
requests:
storage: "4Gi"Startup logs from arcadedb-0
Odd that the serverList is empy and it doesn't think it's on k8s, even though later it announces it is running inside k8s? But it knows to start contacting arcadedb-1, and it's set clearly in the config above...so maybe these are the initial values that are updated later?
+ arcadedb.bucketDefaultPageSize = 65536
+ arcadedb.bucketWipeOutOnDelete = true
+ arcadedb.command.timeout = 0
+ arcadedb.command.warningsEvery = 100
+ arcadedb.commitLockTimeout = 5000
+ arcadedb.cypher.statementCache = 1000
+ arcadedb.dateFormat = yyyy-MM-dd
+ arcadedb.dateImplementation = class java.util.Date
+ arcadedb.dateTimeFormat = yyyy-MM-dd HH:mm:ss
+ arcadedb.dateTimeImplementation = class java.util.Date
+ arcadedb.dumpConfigAtStartup = true
+ arcadedb.dumpMetricsEvery = 0
+ arcadedb.freePageRAM = 50
+ arcadedb.gremlin.timeout = 30000
+ arcadedb.gremlin.engine = auto
+ arcadedb.ha.clusterName = arcadedb
+ arcadedb.ha.enabled = false
+ arcadedb.ha.k8s = false
+ arcadedb.ha.k8sSuffix =
+ arcadedb.ha.quorum = majority
+ arcadedb.ha.quorumTimeout = 10000
+ arcadedb.ha.replicationChunkMaxSize = 16777216
+ arcadedb.ha.replicationFileMaxSize = 1073741824
+ arcadedb.ha.replicationIncomingHost = 0.0.0.0
+ arcadedb.ha.replicationIncomingPorts = 2424-2433
+ arcadedb.ha.replicationQueueSize = 512
+ arcadedb.ha.serverList =
+ arcadedb.ha.serverRole = any
+ arcadedb.indexCompactionMinPagesSchedule = 10
+ arcadedb.indexCompactionRAM = 300
+ arcadedb.initialPageCacheSize = 65535
+ arcadedb.maxPageRAM = 4096
+ arcadedb.mongo.host = 0.0.0.0
+ arcadedb.mongo.port = 27017
+ arcadedb.network.socketTimeout = 30000
+ arcadedb.ssl.keyStore = null
+ arcadedb.ssl.keyStorePassword = null
+ arcadedb.ssl.trustStore = null
+ arcadedb.ssl.trustStorePassword = null
+ arcadedb.ssl.enabled = false
+ arcadedb.pageFlushQueue = 512
+ arcadedb.polyglotCommand.timeout = 10000
+ arcadedb.postgres.debug = false
+ arcadedb.postgres.host = 0.0.0.0
+ arcadedb.postgres.port = 5432
+ arcadedb.profile = default
+ arcadedb.queryMaxHeapElementsAllowedPerOp = 500000
+ arcadedb.redis.host = 0.0.0.0
+ arcadedb.redis.port = 6379
+ arcadedb.server.backupDirectory = ${arcadedb.server.rootPath}/backups
+ arcadedb.server.databaseDirectory = ${arcadedb.server.rootPath}/databases
+ arcadedb.server.databaseLoadAtStartup = true
+ arcadedb.server.defaultDatabases =
+ arcadedb.server.defaultDatabaseMode = READ_WRITE
+ arcadedb.server.httpsIncomingPort = 2490-2499
+ arcadedb.server.httpIncomingHost = 0.0.0.0
+ arcadedb.server.httpIncomingPort = 2480-2489
+ arcadedb.server.httpsIoThreads = 0
+ arcadedb.server.httpSessionExpireTimeout = 1800
+ arcadedb.serverMetrics = true
+ arcadedb.serverMetrics.logging = false
+ arcadedb.server.mode = development
+ arcadedb.server.name = ArcadeDB_0
+ arcadedb.server.plugins =
+ arcadedb.server.rootPassword = null
+ arcadedb.server.rootPasswordPath = null
+ arcadedb.server.rootPath = null
+ arcadedb.server.securityAlgorithm = PBKDF2WithHmacSHA256
+ arcadedb.server.reloadEvery = 5000
+ arcadedb.server.securitySaltCacheSize = 64
+ arcadedb.server.saltIterations = 65536
+ arcadedb.server.eventBusQueueSize = 1000
+ arcadedb.sqlStatementCache = 300
+ arcadedb.test = false
+ arcadedb.txRetries = 3
+ arcadedb.txRetryDelay = 100
+ arcadedb.txWAL = true
+ arcadedb.txWalFlush = 0
+ arcadedb.typeDefaultBuckets = 1
2025-04-24 11:36:43.822 INFO [ArcadeDBServer] Server is running inside Kubernetes. Hostname: arcadedb-0.arcadedb.arcadedb.svc.cluster.local
25-04-24 11:36:43.826 INFO [ArcadeDBServer] <arcadedb-0> ArcadeDB Server v25.2.1 (build 8896e2c572b6e5c32ce069a5517cc9688b0469a2/1740689842482/main) is starting up...
2025-04-24 11:36:43.844 INFO [ArcadeDBServer] <arcadedb-0> Running on Linux 6.8.0-58-generic - OpenJDK 64-Bit Server VM 17.0.14 (Temurin-17.0.14+7)
2025-04-24 11:36:43.849 INFO [ArcadeDBServer] <arcadedb-0> Starting ArcadeDB Server in production mode with plugins [] ...
2025-04-24 11:36:43.936 INFO [ArcadeDBServer] <arcadedb-0> - Metrics Collection Started...
2025-04-24 11:36:44.628 INFO [ServerSecurity] <arcadedb-0> Creating root user with the provided password
2025-04-24 11:36:45.408 INFO [HttpServer] <arcadedb-0> - Starting HTTP Server (host=0.0.0.0 port=2480-2489 httpsPort=2490-2499)...
2025-04-24 11:36:45.504 INFO [undertow] starting server: Undertow - 2.3.18.Final
2025-04-24 11:36:45.513 INFO [xnio] XNIO version 3.8.16.Final
2025-04-24 11:36:45.522 INFO [nio] XNIO NIO Implementation Version 3.8.16.Final
2025-04-24 11:36:45.586 INFO [threads] JBoss Threads version 3.5.0.Final
2025-04-24 11:36:45.651 INFO [HttpServer] <arcadedb-0> - HTTP Server started (host=0.0.0.0 port=2480 httpsPort=2490)
2025-04-24 11:36:45.668 INFO [LeaderNetworkListener] <arcadedb-0> Listening for replication connections on 0.0.0.0:2424 (protocol v.-1)
2025-04-24 11:36:45.686 INFO [HAServer] <arcadedb-0> Error connecting to the remote Leader server arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424 (error=com.arcadedb.network.binary.ConnectionException: Error on connecting to server 'arca
2025-04-24 11:36:45.687 INFO [HAServer] <arcadedb-0> Unable to find any Leader, start election (cluster=arcadedb configuredServers=2 majorityOfVotes=2)
2025-04-24 11:36:45.690 INFO [HAServer] Change election status from DONE to VOTING_FOR_ME
2025-04-24 11:36:45.690 INFO [HAServer] Starting election of local server asking for votes from [arcadedb-0.arcadedb.arcadedb.svc.cluster.local:2424, arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424] (turn=1 retry=0 lastReplicationMessage
2025-04-24 11:36:45.691 INFO [HAServer] Error contacting server arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424 for election: arcadedb-1.arcadedb.arcadedb.svc.cluster.local
2025-04-24 11:36:45.691 INFO [HAServer] Not able to be elected as Leader, waiting 1487ms and retry (turn=1 totalVotes=1 majority=2)
2025-04-24 11:36:45.866 INFO [ArcadeDBServer] <arcadedb-0> Available query languages: [sqlscript, mongo, gremlin, java, cypher, js, graphql, sql]
2025-04-24 11:36:45.868 INFO [ArcadeDBServer] <arcadedb-0> ArcadeDB Server started in 'production' mode (CPUs=2 MAXRAM=2.00GB)
2025-04-24 11:36:47.179 INFO [HAServer] Starting election of local server asking for votes from [arcadedb-0.arcadedb.arcadedb.svc.cluster.local:2424, arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424] (turn=2 retry=1 lastReplicationMessage
2025-04-24 11:36:47.182 INFO [HAServer] Error contacting server arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424 for election: arcadedb-1.arcadedb.arcadedb.svc.cluster.local
2025-04-24 11:36:47.182 INFO [HAServer] Not able to be elected as Leader, waiting 1741ms and retry (turn=2 totalVotes=1 majority=2)
arcadedb-0 2025-04-24 11:26:28.944 INFO [HAServer] Starting election of local server asking for votes from [arcadedb-0.arcadedb.arcadedb.svc.cluster.local:2424, arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424] (turn=171 retry=170 lastRep
arcadedb-0 2025-04-24 11:26:28.945 INFO [HAServer] Error contacting server arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424 for election: arcadedb-1.arcadedb.arcadedb.svc.cluster.local
arcadedb-1 2025-04-24 11:26:28.217 INFO [HAServer] Not able to be elected as Leader, waiting 1626ms and retry (turn=152 totalVotes=1 majority=2)
arcadedb-1 2025-04-24 11:26:29.844 INFO [HAServer] Starting election of local server asking for votes from [arcadedb-0.arcadedb.arcadedb.svc.cluster.local:2424, arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424] (turn=153 retry=152 lastRep
arcadedb-1 2025-04-24 11:26:29.845 INFO [HAServer] Error contacting server arcadedb-0.arcadedb.arcadedb.svc.cluster.local:2424 for election: arcadedb-0.arcadedb.arcadedb.svc.cluster.local
arcadedb-0 2025-04-24 11:26:28.945 INFO [HAServer] Not able to be elected as Leader, waiting 1774ms and retry (turn=171 totalVotes=1 majority=2)
arcadedb-0 2025-04-24 11:26:30.719 INFO [HAServer] Starting election of local server asking for votes from [arcadedb-0.arcadedb.arcadedb.svc.cluster.local:2424, arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424] (turn=172 retry=171 lastRep
arcadedb-0 2025-04-24 11:26:30.722 INFO [HAServer] Error contacting server arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424 for election: arcadedb-1.arcadedb.arcadedb.svc.cluster.local
arcadedb-1 2025-04-24 11:26:29.845 INFO [HAServer] Not able to be elected as Leader, waiting 1767ms and retry (turn=153 totalVotes=1 majority=2)
arcadedb-1 2025-04-24 11:26:31.612 INFO [HAServer] Starting election of local server asking for votes from [arcadedb-0.arcadedb.arcadedb.svc.cluster.local:2424, arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424] (turn=154 retry=153 lastRep
arcadedb-1 2025-04-24 11:26:31.624 INFO [HAServer] Error contacting server arcadedb-0.arcadedb.arcadedb.svc.cluster.local:2424 for election: arcadedb-0.arcadedb.arcadedb.svc.cluster.local
arcadedb-0 2025-04-24 11:26:30.722 INFO [HAServer] Not able to be elected as Leader, waiting 1483ms and retry (turn=172 totalVotes=1 majority=2)
arcadedb-0 2025-04-24 11:26:32.205 INFO [HAServer] Starting election of local server asking for votes from [arcadedb-0.arcadedb.arcadedb.svc.cluster.local:2424, arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424] (turn=173 retry=172 lastRep
arcadedb-0 2025-04-24 11:26:32.207 INFO [HAServer] Error contacting server arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424 for election: arcadedb-1.arcadedb.arcadedb.svc.cluster.local
arcadedb-0 2025-04-24 11:26:32.207 INFO [HAServer] Not able to be elected as Leader, waiting 1214ms and retry (turn=173 totalVotes=1 majority=2)
arcadedb-0 2025-04-24 11:26:33.422 INFO [HAServer] Starting election of local server asking for votes from [arcadedb-0.arcadedb.arcadedb.svc.cluster.local:2424, arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424] (turn=174 retry=173 lastRep
arcadedb-0 2025-04-24 11:26:33.422 INFO [HAServer] Error contacting server arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424 for election: arcadedb-1.arcadedb.arcadedb.svc.cluster.local
arcadedb-1 2025-04-24 11:26:31.625 INFO [HAServer] Not able to be elected as Leader, waiting 1797ms and retry (turn=154 totalVotes=1 majority=2)
arcadedb-1 2025-04-24 11:26:33.422 INFO [HAServer] Starting election of local server asking for votes from [arcadedb-0.arcadedb.arcadedb.svc.cluster.local:2424, arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424] (turn=155 retry=154 lastRep
arcadedb-1 2025-04-24 11:26:33.422 INFO [HAServer] Error contacting server arcadedb-0.arcadedb.arcadedb.svc.cluster.local:2424 for election: arcadedb-0.arcadedb.arcadedb.svc.cluster.local
arcadedb-0 2025-04-24 11:26:33.422 INFO [HAServer] Not able to be elected as Leader, waiting 1641ms and retry (turn=174 totalVotes=1 majority=2)
arcadedb-0 2025-04-24 11:26:35.064 INFO [HAServer] Starting election of local server asking for votes from [arcadedb-0.arcadedb.arcadedb.svc.cluster.local:2424, arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424] (turn=175 retry=174 lastRep
arcadedb-0 2025-04-24 11:26:35.075 INFO [HAServer] Error contacting server arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424 for election: arcadedb-1.arcadedb.arcadedb.svc.cluster.local
arcadedb-1 2025-04-24 11:26:33.423 INFO [HAServer] Not able to be elected as Leader, waiting 1979ms and retry (turn=155 totalVotes=1 majority=2)
arcadedb-1 2025-04-24 11:26:35.402 INFO [HAServer] Starting election of local server asking for votes from [arcadedb-0.arcadedb.arcadedb.svc.cluster.local:2424, arcadedb-1.arcadedb.arcadedb.svc.cluster.local:2424] (turn=156 retry=155 lastRep
arcadedb-1 2025-04-24 11:26:35.402 INFO [HAServer] Error contacting server arcadedb-0.arcadedb.arcadedb.svc.cluster.local:2424 for election: arcadedb-0.arcadedb.arcadedb.svc.cluster.local