There is a bug in the concurrent snapshot logic where the following situation involving three concurrent snapshots and a snapshot delete is broken and may lead to writing corrupted repository metadata:
- Start 3 snapshots for the same two indices
- Abort the one in the middle before after the first snapshot finishes on the data node (as far as writing to the repository goes) but before the index gets out of the queued state for the second snapshot
- third snapshot is moved started once the middling snapshot completes to
FAILED state but has null for the shard generation for shards in the shared index
This is a fairly unlikely scenario to run into since the abort must be timed just right, but it's somewhat more likely if the second snapshot has a larger diff with the first snapshot (so writing the files takes longer ... though even in this scenario the CS with hte abort has to be applied on the data node right after finishing the last file).
=> fixing this asap but probably not before next week
There is a bug in the concurrent snapshot logic where the following situation involving three concurrent snapshots and a snapshot delete is broken and may lead to writing corrupted repository metadata:
FAILEDstate but hasnullfor the shard generation for shards in the shared indexThis is a fairly unlikely scenario to run into since the abort must be timed just right, but it's somewhat more likely if the second snapshot has a larger diff with the first snapshot (so writing the files takes longer ... though even in this scenario the CS with hte abort has to be applied on the data node right after finishing the last file).
=> fixing this asap but probably not before next week