Today all the shard-level operations during snapshot deletion will log exceptions on failure, but the deletion process continues regardless. This makes sense in the pre-7.6.0 repository format because the shard-level operations happen after updating the root RepositoryData blob, at which point the deletion cannot really fail. But since 7.6.0 we do the shard-level operations first, in order to obtain the names of all the new BlobStoreIndexShardSnapshots blobs. That means we could choose to be stricter and bail out, failing the deletion process before updating the root RepositoryData blob. IMO there's no great reason to be lenient here, a failure to update the shard-level metadata is surely serious enough to halt the process, and stopping on failure avoids bringing the repository into a state where the shard-level metadata is inconsistent with the root.
Opening this for discussion: should we treat these exceptions more seriously now?
Today all the shard-level operations during snapshot deletion will log exceptions on failure, but the deletion process continues regardless. This makes sense in the pre-7.6.0 repository format because the shard-level operations happen after updating the root
RepositoryDatablob, at which point the deletion cannot really fail. But since 7.6.0 we do the shard-level operations first, in order to obtain the names of all the newBlobStoreIndexShardSnapshotsblobs. That means we could choose to be stricter and bail out, failing the deletion process before updating the rootRepositoryDatablob. IMO there's no great reason to be lenient here, a failure to update the shard-level metadata is surely serious enough to halt the process, and stopping on failure avoids bringing the repository into a state where the shard-level metadata is inconsistent with the root.Opening this for discussion: should we treat these exceptions more seriously now?