Cancellation of shard relocation does not work in 2.2.0

**Elasticsearch version**:

```
# elasticsearch --version
Version: 2.2.0, Build: 8ff36d1/2016-01-27T13:32:39Z, JVM: 1.8.0_72-internal
```

**JVM version**:

```
# java -version
openjdk version "1.8.0_72-internal"
OpenJDK Runtime Environment (build 1.8.0_72-internal-b15)
OpenJDK 64-Bit Server VM (build 25.72-b15, mixed mode)
```

**OS version**: Debian Jessie on kernel 4.1.3.

**Description of the problem including expected versus actual behavior**:

[Docs say](https://www.elastic.co/guide/en/elasticsearch/reference/current/delayed-allocation.html) that primary and replica have to have the same synch id in order to achieve immediate recovery and avoid costly relocation. I restart 1 node out of 8 and see that most of indices recover on remaining nodes, even though restarted node rejoined. Week old indices recover too. To be fair, it did not work for me on 1.7.3 either. #6069 is is closed, therefore I'm filing this issue.

**Steps to reproduce**:
1. Restart one node.
2. Wait until node rejoins.

Expected: immediate recovery for old indices, translog recovery for active indices. Nice and easy.

Actual: almost all (if not all) recover, active indices recover all data files (terabytes of them). Ingestion is suffering from backpressure, people notice delayed indexing and tell mean things about you. Sadness and disappointment.

It seems that some indices do not have `sync_id`. I tried checking sync IDs for old indices that were recovering and the field appeared.

Before:

``` json
[
  {
    "routing": {
      "state": "STARTED",
      "primary": true,
      "node": "hOM4Or2fTG-Do4ZkR9jIRQ",
      "relocating_node": null
    },
    "commit": {
      "id": "MJ19KOilFLcuYnGni1rE+A==",
      "generation": 81,
      "user_data": {
        "translog_uuid": "OUU730pTTSOGk-07aJEMJw",
        "translog_generation": "80"
      },
      "num_docs": 62732221
    },
    "shard_path": {
      "state_path": "/disk/data6/es/main/main/nodes/0",
      "data_path": "/disk/data6/es/main/main/nodes/0",
      "is_custom_data_path": false
    }
  },
  {
    "routing": {
      "state": "STARTED",
      "primary": false,
      "node": "yNaQ5IGARhGtu5FN8AvGUQ",
      "relocating_node": null
    },
    "commit": {
      "id": "4F6/8APNSb40wCqr89bs5g==",
      "generation": 82,
      "user_data": {
        "translog_uuid": "GE4r0UDHTda2aLd-PwJ9Bg",
        "translog_generation": "80"
      },
      "num_docs": 62732221
    },
    "shard_path": {
      "state_path": "/disk/data5/es/main/main/nodes/0",
      "data_path": "/disk/data5/es/main/main/nodes/0",
      "is_custom_data_path": false
    }
  }
]
```

Then I do manual synched flush:

```
# curl -X POST -s http://myhost/myindex-2016.02.29/_flush/synced | jq .
```

``` json
{
  "_shards": {
    "total": 2,
    "successful": 2,
    "failed": 0
  },
  "www-nginx-error-2016.03.01": {
    "total": 2,
    "successful": 2,
    "failed": 0
  }
}
```

After:

``` json
[
  {
    "routing": {
      "state": "STARTED",
      "primary": true,
      "node": "hOM4Or2fTG-Do4ZkR9jIRQ",
      "relocating_node": null
    },
    "commit": {
      "id": "MJ19KOilFLcuYnGni3ExRw==",
      "generation": 82,
      "user_data": {
        "translog_uuid": "OUU730pTTSOGk-07aJEMJw",
        "sync_id": "AVNXkFKDdHUamVU5aCvy",
        "translog_generation": "80"
      },
      "num_docs": 62732221
    },
    "shard_path": {
      "state_path": "/disk/data6/es/main/main/nodes/0",
      "data_path": "/disk/data6/es/main/main/nodes/0",
      "is_custom_data_path": false
    }
  },
  {
    "routing": {
      "state": "STARTED",
      "primary": false,
      "node": "yNaQ5IGARhGtu5FN8AvGUQ",
      "relocating_node": null
    },
    "commit": {
      "id": "4F6/8APNSb40wCqr8+yqgQ==",
      "generation": 83,
      "user_data": {
        "translog_uuid": "GE4r0UDHTda2aLd-PwJ9Bg",
        "sync_id": "AVNXkFKDdHUamVU5aCvy",
        "translog_generation": "80"
      },
      "num_docs": 62732221
    },
    "shard_path": {
      "state_path": "/disk/data5/es/main/main/nodes/0",
      "data_path": "/disk/data5/es/main/main/nodes/0",
      "is_custom_data_path": false
    }
  }
]
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cancellation of shard relocation does not work in 2.2.0 #17019

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Cancellation of shard relocation does not work in 2.2.0 #17019

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions