Skip to content

Snapshot Repositories Containing a Mix of pre and post v7.6 Snapshots Can Become Corrupted #57798

@original-brownbear

Description

@original-brownbear

Repositories that contain both snapshots from before version 7.6 and after 7.6 can become dysfunctional and in some cases corrupted by ES v7.7 clusters as a result of a mistake in how RepositoryData is cached.
The RepositoryData is cached including ShardGenerations that include numeric generation values that might not be reliable (any failed snapshot finalization that had at least one individual shard snapshot would cause an incorrect shard generation to be tracked).

This leads to two stages of broken behavior:

  1. As long as there is still at least one pre-7.6 snapshot in the repository, new ShardGenerations will not be physically written to the repository. The issue will show up in errors like while creating new snapshots of affected shards, leading to PARTIAL snapshots because the affected shards will never successfully snapshot.
 [2020-06-04T00:00:00.206Z][WARN][org.elasticsearch.snapshots.SnapshotShardsService] [instance-0000000012] [[xxx][0]][xxx:xxx/Lnmw3145RGubRA7oiWqUsg] failed to snapshot shard
java.nio.file.NoSuchFileException: Blob [snapshots/585b4ab8ad5e44d8a8144df80846222b/indices/IzG2oACDQbiD_1479qE4IA/0/index-7866] does not exist

Also, snapshot deletes will log the same error, but will work otherwise. This leads to the second stage of the issue described below.
At this stage of the problem, the repository can be fixed and further corruption prevented by setting the setting the repository setting cache_repository_data to false.

  1. Once all the pre-7.6 snapshots have been deleted from a repository the broken RepositoryGenerations that were incorrectly cached, will be written to the repository physically.
    Once this has happened the repository is physically corrupted and the only way to fix it at this point is to delete all snapshots referencing the broken shards.

We will do two steps of fixing things here:

cc @ywelsch , @paulcoghlan

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions