Summary
A node that joins an existing Raft HA cluster with an empty data directory never acquires databases it has not previously seen on disk. It only ends up with databases created from arcadedb.server.defaultDatabases at boot. Databases that were created or imported at runtime (e.g. via Studio) on the other members are silently missing on the new node, with no automatic recovery.
Reported in #4522 (Kubernetes StatefulSet scaled from 3 to 5 nodes): the two new pods came up with only the defaultDatabases database and never received the runtime-imported one.
Root cause
The follower snapshot-install path only refreshes databases the node already has on disk:
// ha-raft/.../ArcadeStateMachine.java (notifyInstallSnapshotFromLeader)
for (final String dbName : server.getDatabaseNames()) {
...
if (server.existsDatabase(dbName))
SnapshotInstaller.install(dbName, ...);
}
server.getDatabaseNames() returns only the databases present locally, so a brand-new empty node has nothing to iterate over for databases it has never seen. There is no step that lists the leader's databases and pulls the ones the joining node is missing.
A runtime database is created on a peer only when the INSTALL_DATABASE_ENTRY is replayed from the Raft log (applyInstallDatabaseEntry -> server.createDatabase(...)), which a late joiner does not necessarily replay (log compaction, fresh bootstrap with an empty log). The bootstrap-fingerprint path (applyBootstrapFingerprintEntry) explicitly records a baseline for absent databases and defers to a "follow-on snapshot" that never arrives in this scenario.
Aggravating factor observed in #4522
Because both the operator resync endpoint and the follower snapshot install download from the leader, the situation gets worse if a node that does not hold a given database is elected leader: the only authoritative copies live on followers, and there is no easy path to redistribute without first transferring leadership.
Current workarounds
- Transfer leadership to a node that holds the database (
POST /api/v1/cluster/leader), then run POST /api/v1/cluster/resync/{database} on each node that is missing it.
- Offline: copy the database directory onto the new node's volume while the cluster is stopped.
Both are manual and easy to get wrong.
Proposed improvement
When a node joins the cluster, it should reconcile its local database set against the leader's and automatically pull (full snapshot install) any database it is missing, rather than only refreshing the ones it already has. This should cover:
- A new empty node added to a live cluster (incremental StatefulSet scale-up).
- A node participating in a fresh bootstrap election after a full cluster restart.
Considerations:
- Pull from the leader, but guard against the "leader is missing the database" case (e.g. detect that a member holds a database the leader does not, and refuse to silently treat it as dropped). A DROP must remain distinguishable from "leader never had it."
- Keep it bandwidth-aware: full-snapshot pull per missing database, with the existing crash-safe download-and-swap (
SnapshotInstaller).
- Surface progress/health in the cluster status and Studio HA panel.
References
Summary
A node that joins an existing Raft HA cluster with an empty data directory never acquires databases it has not previously seen on disk. It only ends up with databases created from
arcadedb.server.defaultDatabasesat boot. Databases that were created or imported at runtime (e.g. via Studio) on the other members are silently missing on the new node, with no automatic recovery.Reported in #4522 (Kubernetes StatefulSet scaled from 3 to 5 nodes): the two new pods came up with only the
defaultDatabasesdatabase and never received the runtime-imported one.Root cause
The follower snapshot-install path only refreshes databases the node already has on disk:
server.getDatabaseNames()returns only the databases present locally, so a brand-new empty node has nothing to iterate over for databases it has never seen. There is no step that lists the leader's databases and pulls the ones the joining node is missing.A runtime database is created on a peer only when the
INSTALL_DATABASE_ENTRYis replayed from the Raft log (applyInstallDatabaseEntry->server.createDatabase(...)), which a late joiner does not necessarily replay (log compaction, fresh bootstrap with an empty log). The bootstrap-fingerprint path (applyBootstrapFingerprintEntry) explicitly records a baseline for absent databases and defers to a "follow-on snapshot" that never arrives in this scenario.Aggravating factor observed in #4522
Because both the operator
resyncendpoint and the follower snapshot install download from the leader, the situation gets worse if a node that does not hold a given database is elected leader: the only authoritative copies live on followers, and there is no easy path to redistribute without first transferring leadership.Current workarounds
POST /api/v1/cluster/leader), then runPOST /api/v1/cluster/resync/{database}on each node that is missing it.Both are manual and easy to get wrong.
Proposed improvement
When a node joins the cluster, it should reconcile its local database set against the leader's and automatically pull (full snapshot install) any database it is missing, rather than only refreshing the ones it already has. This should cover:
Considerations:
SnapshotInstaller).References
ha-raft/src/main/java/com/arcadedb/server/ha/raft/ArcadeStateMachine.java(notifyInstallSnapshotFromLeader,applyInstallDatabaseEntry,applyBootstrapFingerprintEntry),SnapshotInstaller.java