When restarting a cluster from green state, each shard appears to undergo some form of checksum to verify it before bringing it online.
Is there a way to journal writes so that recovery is much much faster, in the way that the xfs filesystem does it.
Only review the data that was being written to at the time of the outage or shutdown so that only the in-progress write data needs to be checked.
For a clean shutdown, maybe a complete cluster restart command could tell all nodes to shutdown in a clean state then turn off, allowing a near instantaneous recovery on startup. Like stop allocation then flush all translogs, then shut down,etc.
Would just take lots of the pain out of cluster restarts.
Just an idea
When restarting a cluster from green state, each shard appears to undergo some form of checksum to verify it before bringing it online.
Is there a way to journal writes so that recovery is much much faster, in the way that the xfs filesystem does it.
Only review the data that was being written to at the time of the outage or shutdown so that only the in-progress write data needs to be checked.
For a clean shutdown, maybe a complete cluster restart command could tell all nodes to shutdown in a clean state then turn off, allowing a near instantaneous recovery on startup. Like stop allocation then flush all translogs, then shut down,etc.
Would just take lots of the pain out of cluster restarts.
Just an idea