HA: new cluster node does not auto-acquire databases it has never seen (only refreshes existing ones)

## Summary

A node that joins an existing Raft HA cluster **with an empty data directory** never acquires databases it has not previously seen on disk. It only ends up with databases created from `arcadedb.server.defaultDatabases` at boot. Databases that were created or imported at runtime (e.g. via Studio) on the other members are silently missing on the new node, with no automatic recovery.

Reported in #4522 (Kubernetes StatefulSet scaled from 3 to 5 nodes): the two new pods came up with only the `defaultDatabases` database and never received the runtime-imported one.

## Root cause

The follower snapshot-install path only refreshes databases the node **already has on disk**:

```java
// ha-raft/.../ArcadeStateMachine.java  (notifyInstallSnapshotFromLeader)
for (final String dbName : server.getDatabaseNames()) {
  ...
  if (server.existsDatabase(dbName))
    SnapshotInstaller.install(dbName, ...);
}
```

`server.getDatabaseNames()` returns only the databases present locally, so a brand-new empty node has nothing to iterate over for databases it has never seen. There is no step that lists the leader's databases and pulls the ones the joining node is missing.

A runtime database is created on a peer only when the `INSTALL_DATABASE_ENTRY` is replayed from the Raft log (`applyInstallDatabaseEntry` -> `server.createDatabase(...)`), which a late joiner does not necessarily replay (log compaction, fresh bootstrap with an empty log). The bootstrap-fingerprint path (`applyBootstrapFingerprintEntry`) explicitly records a baseline for absent databases and defers to a "follow-on snapshot" that never arrives in this scenario.

## Aggravating factor observed in #4522

Because both the operator `resync` endpoint and the follower snapshot install download from the **leader**, the situation gets worse if a node that does not hold a given database is elected leader: the only authoritative copies live on followers, and there is no easy path to redistribute without first transferring leadership.

## Current workarounds

- Transfer leadership to a node that holds the database (`POST /api/v1/cluster/leader`), then run `POST /api/v1/cluster/resync/{database}` on each node that is missing it.
- Offline: copy the database directory onto the new node's volume while the cluster is stopped.

Both are manual and easy to get wrong.

## Proposed improvement

When a node joins the cluster, it should reconcile its local database set against the leader's and automatically pull (full snapshot install) any database it is missing, rather than only refreshing the ones it already has. This should cover:

- A new empty node added to a live cluster (incremental StatefulSet scale-up).
- A node participating in a fresh bootstrap election after a full cluster restart.

Considerations:

- Pull from the leader, but guard against the "leader is missing the database" case (e.g. detect that a member holds a database the leader does not, and refuse to silently treat it as dropped). A DROP must remain distinguishable from "leader never had it."
- Keep it bandwidth-aware: full-snapshot pull per missing database, with the existing crash-safe download-and-swap (`SnapshotInstaller`).
- Surface progress/health in the cluster status and Studio HA panel.

## References

- Discussion: #4522
- Code: `ha-raft/src/main/java/com/arcadedb/server/ha/raft/ArcadeStateMachine.java` (`notifyInstallSnapshotFromLeader`, `applyInstallDatabaseEntry`, `applyBootstrapFingerprintEntry`), `SnapshotInstaller.java`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

HA: new cluster node does not auto-acquire databases it has never seen (only refreshes existing ones) #4727

Summary

Root cause

Aggravating factor observed in #4522

Current workarounds

Proposed improvement

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

HA: new cluster node does not auto-acquire databases it has never seen (only refreshes existing ones) #4727

Description

Summary

Root cause

Aggravating factor observed in #4522

Current workarounds

Proposed improvement

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions