The following test fails 100% of the time:
public void testOutOfOrderFinalization() throws Exception {
internalCluster().startMasterOnlyNode();
final List<String> dataNodes = internalCluster().startDataOnlyNodes(2);
final String index1 = "index-1";
final String index2 = "index-2";
createIndexWithContent(index1, dataNodes.get(0), dataNodes.get(1));
createIndexWithContent(index2, dataNodes.get(1), dataNodes.get(0));
final String repository = "test-repo";
createRepository(repository, "mock");
blockNodeWithIndex(repository, index2);
final ActionFuture<CreateSnapshotResponse> snapshot1 = clusterAdmin()
.prepareCreateSnapshot(repository, "snapshot-1")
.setIndices(index1, index2)
.setWaitForCompletion(true)
.execute();
awaitNumberOfSnapshotsInProgress(1);
final ActionFuture<CreateSnapshotResponse> snapshot2 = clusterAdmin()
.prepareCreateSnapshot(repository, "snapshot-2")
.setIndices(index1)
.setWaitForCompletion(true)
.execute();
assertSuccessful(snapshot2);
unblockAllDataNodes(repository);
final SnapshotInfo sn1 = assertSuccessful(snapshot1);
assertAcked(startDeleteSnapshot(repository, sn1.snapshot().getSnapshotId().getName()).get());
assertThat(
clusterAdmin().prepareSnapshotStatus().setSnapshots("snapshot-2").setRepository(repository).get().getSnapshots(),
hasSize(1)
);
}
=> If a shard snapshot in an earlier is successful but a later snapshot containing that shard finalizes before the earlier snapshot finalizes the shard level metadata gets corrupted in a very subtle way where the shard points at an incorrect generation but all the snap- blobs in the shard are still correct until the next delete (thus _status api calls will still work) but data blobs still may be deleted incorrectly for the shard.
=> on it fixing this.
The following test fails 100% of the time:
=> If a shard snapshot in an earlier is successful but a later snapshot containing that shard finalizes before the earlier snapshot finalizes the shard level metadata gets corrupted in a very subtle way where the shard points at an incorrect generation but all the
snap-blobs in the shard are still correct until the next delete (thus_statusapi calls will still work) but data blobs still may be deleted incorrectly for the shard.=> on it fixing this.