-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Restoring a snapshot can leave cluster state in broken state #19774
Description
After a failed restore, the restore state in the cluster state is not properly cleaned up, blocking future restores from starting. Only a full cluster restart cleans the restore state.
Source: https://discuss.elastic.co/t/cannot-restore-snapshot-process-already-running/56746
Environment: 10 node cluster / elasticsearch 2.3.4 / Debian Jessie / NFS repository type
The cluster state shows that there are indices (e.g. index-a) which exist in snapshot state but don’t exist in metadata. Then there are indices (e.g. index-b) that are marked as INIT in snapshot state but shard routings are all marked as STARTED.
Excerpts from the anonymized cluster state:
"restore" : {
"snapshots" : [ {
"snapshot" : "backup",
"repository" : "backup",
"state" : "STARTED",
"indices" : [ "index-a", "index-b", ... ],
"shards" : [ {
"index" : "index-a",
"shard" : 0,
"state" : "SUCCESS"
}, {
"index" : "index-a",
"shard" : 1,
"state" : "FAILURE"
}, ...
..., {
"index" : "index-b",
"shard" : 0,
"state" : "INIT"
}, {
"index" : "index-b",
"shard" : 1,
"state" : "INIT"
}, {
"index" : "index-b",
"shard" : 2,
"state" : "SUCCESS"
},
there is no indexmetadata/shardroutings for index-a in the cluster state but for index-b:
IndexMetaData:
...,
"index-b" : {
"state" : "open",
"settings" : {
"index" : {
"creation_date" : "...",
...
}
},
"mappings" : {
...
},
"aliases" : ...
},
and
ShardRoutings:
...,
"index-b" : {
"shards" : {
"2" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "HHYUOZZ7TqeapIamJeYU8w",
"relocating_node" : null,
"shard" : 2,
"index" : "index-b",
"version" : 47,
"allocation_id" : {
"id" : "2GELW1i0S7i8KPf_moWugg"
}
}, {
"state" : "STARTED",
"primary" : true,
"node" : "r-kHLyr8Q3SlTt6ToVlq1A",
"relocating_node" : null,
"shard" : 2,
"index" : "index-b",
"version" : 47,
"allocation_id" : {
"id" : "S_O2zIM_ShG9g1oNCfeVew"
}
}, {
"state" : "STARTED",
"primary" : false,
"node" : "_7WX9LH2TYyHEMykc-1s9w",
"relocating_node" : null,
"shard" : 2,
"index" : "index-b",
"version" : 47,
"allocation_id" : {
"id" : "-61Vup2mSnOrRKG2GfrY5Q"
}
} ],
"1" : [ {
"state" : "STARTED",
"primary" : true,
"node" : "vfw5y4jDTh-tiI2Gqc5fyA",
"relocating_node" : null,
"shard" : 1,
"index" : "index-b",
"version" : 45,
"allocation_id" : {
"id" : "N4mQmap3TfGZWVsD8B7f3Q"
}
}, {
"state" : "STARTED",
"primary" : false,
"node" : "HHYUOZZ7TqeapIamJeYU8w",
"relocating_node" : null,
"shard" : 1,
"index" : "index-b",
"version" : 45,
"allocation_id" : {
"id" : "6EJtCi5uR8aRzDcow69yQQ"
}
}, {
"state" : "STARTED",
"primary" : false,
"node" : "6IFVYG3ZT7iS0CZR6hhUsw",
"relocating_node" : null,
"shard" : 1,
"index" : "index-b",
"version" : 45,
"allocation_id" : {
"id" : "ufHda_oUQ6qm5jHSV50Crw"
}
} ],
"0" : [ {
"state" : "STARTED",
"primary" : false,
"node" : "h6XBV034RL-psP0qXW9vmw",
"relocating_node" : null,
"shard" : 0,
"index" : "index-b",
"version" : 60,
"allocation_id" : {
"id" : "9QFXNtkqQYCkYfJ_HJ7mbA"
}
}, {
"state" : "STARTED",
"primary" : true,
"node" : "lm1ClSoQT561rp7xvgmFbg",
"relocating_node" : null,
"shard" : 0,
"index" : "index-b",
"version" : 60,
"allocation_id" : {
"id" : "YrCTXmZ6RVm07a7d5UYPVg"
}
}, {
"state" : "STARTED",
"primary" : false,
"node" : "xmuJY7BYRO6893-XTpTnTQ",
"relocating_node" : null,
"shard" : 0,
"index" : "index-b",
"version" : 60,
"allocation_id" : {
"id" : "x2u628VdSHS3p8vl9e7ufA"
}
} ]
}
}, ...