Skip to content

Restoring a snapshot can leave cluster state in broken state #19774

@ywelsch

Description

@ywelsch

After a failed restore, the restore state in the cluster state is not properly cleaned up, blocking future restores from starting. Only a full cluster restart cleans the restore state.

Source: https://discuss.elastic.co/t/cannot-restore-snapshot-process-already-running/56746
Environment: 10 node cluster / elasticsearch 2.3.4 / Debian Jessie / NFS repository type

The cluster state shows that there are indices (e.g. index-a) which exist in snapshot state but don’t exist in metadata. Then there are indices (e.g. index-b) that are marked as INIT in snapshot state but shard routings are all marked as STARTED.

Excerpts from the anonymized cluster state:

  "restore" : {
    "snapshots" : [ {
      "snapshot" : "backup",
      "repository" : "backup",
      "state" : "STARTED",
      "indices" : [ "index-a", "index-b", ... ],
      "shards" : [  {
        "index" : "index-a",
        "shard" : 0,
        "state" : "SUCCESS"
      }, {
        "index" : "index-a",
        "shard" : 1,
        "state" : "FAILURE"
      }, ...

     ..., {
        "index" : "index-b",
        "shard" : 0,
        "state" : "INIT"
      }, {
        "index" : "index-b",
        "shard" : 1,
        "state" : "INIT"
      }, {
        "index" : "index-b",
        "shard" : 2,
        "state" : "SUCCESS"
      },

there is no indexmetadata/shardroutings for index-a in the cluster state but for index-b:

IndexMetaData:

      ...,
      "index-b" : {
        "state" : "open",
        "settings" : {
          "index" : {
            "creation_date" : "...",
            ...
          }
        },
        "mappings" : {
          ...
        },
        "aliases" : ...
      },

and

ShardRoutings:

      ...,
      "index-b" : {
        "shards" : {
          "2" : [ {
            "state" : "STARTED",
            "primary" : false,
            "node" : "HHYUOZZ7TqeapIamJeYU8w",
            "relocating_node" : null,
            "shard" : 2,
            "index" : "index-b",
            "version" : 47,
            "allocation_id" : {
              "id" : "2GELW1i0S7i8KPf_moWugg"
            }
          }, {
            "state" : "STARTED",
            "primary" : true,
            "node" : "r-kHLyr8Q3SlTt6ToVlq1A",
            "relocating_node" : null,
            "shard" : 2,
            "index" : "index-b",
            "version" : 47,
            "allocation_id" : {
              "id" : "S_O2zIM_ShG9g1oNCfeVew"
            }
          }, {
            "state" : "STARTED",
            "primary" : false,
            "node" : "_7WX9LH2TYyHEMykc-1s9w",
            "relocating_node" : null,
            "shard" : 2,
            "index" : "index-b",
            "version" : 47,
            "allocation_id" : {
              "id" : "-61Vup2mSnOrRKG2GfrY5Q"
            }
          } ],
          "1" : [ {
            "state" : "STARTED",
            "primary" : true,
            "node" : "vfw5y4jDTh-tiI2Gqc5fyA",
            "relocating_node" : null,
            "shard" : 1,
            "index" : "index-b",
            "version" : 45,
            "allocation_id" : {
              "id" : "N4mQmap3TfGZWVsD8B7f3Q"
            }
          }, {
            "state" : "STARTED",
            "primary" : false,
            "node" : "HHYUOZZ7TqeapIamJeYU8w",
            "relocating_node" : null,
            "shard" : 1,
            "index" : "index-b",
            "version" : 45,
            "allocation_id" : {
              "id" : "6EJtCi5uR8aRzDcow69yQQ"
            }
          }, {
            "state" : "STARTED",
            "primary" : false,
            "node" : "6IFVYG3ZT7iS0CZR6hhUsw",
            "relocating_node" : null,
            "shard" : 1,
            "index" : "index-b",
            "version" : 45,
            "allocation_id" : {
              "id" : "ufHda_oUQ6qm5jHSV50Crw"
            }
          } ],
          "0" : [ {
            "state" : "STARTED",
            "primary" : false,
            "node" : "h6XBV034RL-psP0qXW9vmw",
            "relocating_node" : null,
            "shard" : 0,
            "index" : "index-b",
            "version" : 60,
            "allocation_id" : {
              "id" : "9QFXNtkqQYCkYfJ_HJ7mbA"
            }
          }, {
            "state" : "STARTED",
            "primary" : true,
            "node" : "lm1ClSoQT561rp7xvgmFbg",
            "relocating_node" : null,
            "shard" : 0,
            "index" : "index-b",
            "version" : 60,
            "allocation_id" : {
              "id" : "YrCTXmZ6RVm07a7d5UYPVg"
            }
          }, {
            "state" : "STARTED",
            "primary" : false,
            "node" : "xmuJY7BYRO6893-XTpTnTQ",
            "relocating_node" : null,
            "shard" : 0,
            "index" : "index-b",
            "version" : 60,
            "allocation_id" : {
              "id" : "x2u628VdSHS3p8vl9e7ufA"
            }
          } ]
        }
      }, ...

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions