HA: Raft snapshot install crashes in notifyInstallSnapshotFromLeader — ServerDatabase.close() throws UnsupportedOperationException (26.6.1), follower never rejoins, cluster loses write quorum

## Summary

On a 3-node HA cluster (Kubernetes StatefulSet), when a follower is told by the leader to perform a **full snapshot resync**, the snapshot installation crashes on **every** attempt because `ArcadeStateMachine.notifyInstallSnapshotFromLeader` calls `ServerDatabase.close()` on a shared, server-managed database — and `ServerDatabase.close()` is hardcoded to throw `UnsupportedOperationException`.

The follower can therefore never complete the snapshot install, never rejoins the Raft group, and the cluster permanently loses write quorum. All writes then fail with `QuorumNotReachedException`, and the node spins in a tight retry loop (we observed ~2.77M log lines in a 7-minute window).

## Version

- ArcadeDB **26.6.1**
- 3-node HA cluster (`*-0`, `*-1`, `*-2`) on Kubernetes, Raft replication

## What happens

Leader is node `-1`. Followers `-0` and `-2` are asked to do a full resync (`firstLogIndex=(t:11, i:98885)`) and fail repeatedly:

```
Installing snapshot for database '.raft' from leader <node-1>:2480...
Snapshot installation requested from leader (firstLogIndex=(t:11, i:98885)). Starting full resync...
Error during snapshot installation from leader
<node>_2434@group-XXXX: Failed to notify StateMachine to InstallSnapshot.
    Exception: java.lang.RuntimeException: Error during Raft snapshot installation
```

Underlying cause (logged at `SEVERE`):

```
java.lang.UnsupportedOperationException: Embedded database taken from the server are shared and therefore cannot be closed
	at com.arcadedb.server.ServerDatabase.close(ServerDatabase.java:103)
	at com.arcadedb.server.ha.raft.ArcadeStateMachine.lambda$notifyInstallSnapshotFromLeader$4(ArcadeStateMachine.java:506)
```

## Root cause

`ServerDatabase.close()` intentionally rejects closing a shared server database:

```java
// ServerDatabase.java:103
throw new UnsupportedOperationException(
    "Embedded database taken from the server are shared and therefore cannot be closed");
```

But the snapshot-install path (`ArcadeStateMachine.notifyInstallSnapshotFromLeader`, ~line 506) tries to `close()` that very database before swapping in the snapshot. So the install always aborts. This is deterministic, not transient — once a follower needs a full snapshot resync, it can never succeed.

## Impact

```
notifyInstallSnapshotFromLeader -> ServerDatabase.close() -> UnsupportedOperationException
  -> "Error during Raft snapshot installation" (every attempt, crash-loop)
    -> follower never rejoins -> cluster loses write quorum
      -> commits fail: com.arcadedb.network.binary.QuorumNotReachedException: Group commit entry failed: TimeoutException
        -> all client writes return HTTP 500
```

The cluster does **not** self-heal; it requires manual intervention (recycling/wiping the stuck followers).

## Expected behavior

During snapshot installation the state machine should release/replace the shared server database through a supported path (e.g. the server's database-management API) rather than calling `ServerDatabase.close()`, so a follower can complete a full resync and rejoin the quorum.

## Possibly related

- #4729 (HA snapshot serving timed out) — same area, different mechanism (leader-side serving timeout)
- #4748 (refactor `DatabaseReconciler` out of `ArcadeStateMachine`) — touches the same class

Happy to provide more logs if useful.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

HA: Raft snapshot install crashes in notifyInstallSnapshotFromLeader — ServerDatabase.close() throws UnsupportedOperationException (26.6.1), follower never rejoins, cluster loses write quorum #4749

Summary

Version

What happens

Root cause

Impact

Expected behavior

Possibly related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

HA: Raft snapshot install crashes in notifyInstallSnapshotFromLeader — ServerDatabase.close() throws UnsupportedOperationException (26.6.1), follower never rejoins, cluster loses write quorum #4749

Description

Summary

Version

What happens

Root cause

Impact

Expected behavior

Possibly related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions