Nodes in process of shutting down should not respond to discovery pings

It looks like a node that is being shutdown will still reply to an discovery ping.  

Master `node-108` is gracefully shutdown, but still listed in the current nodes list. I guess this is because its just some milliseconds after the shutdown started.

```
[2017-11-08T15:01:00,779][INFO ][o.e.d.z.ZenDiscovery     ] [node-106] master_left [{node-108}{SnsBg56PRuintH0DINacJQ}{Q0yTDGUpQNag1dKS56yoHw}{node108.home.lan}{10.10.10.15:9300}], reason [shut_down]
[2017-11-08T15:01:00,781][WARN ][o.e.d.z.ZenDiscovery     ] [node-106] master left (reason = shut_down), current nodes: nodes: 
...
   {node-108}{SnsBg56PRuintH0DINacJQ}{Q0yTDGUpQNag1dKS56yoHw}{node108.home.lan}{10.10.10.15:9300}, master
```

Then 3 seconds later the ping responses coming back and `node-108` is still listed. I checked the log on `node-108`, but there is nothing later than `2017-11-08T15:01:00,843`, which makes me wonder why it is still listed in the ping list. 

```
[2017-11-08T15:01:03,785][TRACE][o.e.d.z.ZenDiscovery     ] [node-106] full ping responses:
...
        --> ping_response{node [{node-108}{SnsBg56PRuintH0DINacJQ}{Q0yTDGUpQNag1dKS56yoHw}{node108.home.lan}{10.10.10.15:9300}], id[44], master [{node-108}{SnsBg56PRuintH0DINacJQ}{Q0yTDGUpQNag1dKS56yoHw}{node108.home.lan}{10.10.10.15:9300}],cluster_state_version [420], cluster_name[my-cluster]}
...
[2017-11-08T15:01:03,789][WARN ][o.e.d.z.ZenDiscovery     ] [node-106] failed to connect to master [{node-108}{SnsBg56PRuintH0DINacJQ}{Q0yTDGUpQNag1dKS56yoHw}{node108.home.lan}{10.10.10.15:9300}], retrying...
```

Only explanation I have is that ping was replied before `2017-11-08T15:01:00,843` already, which would match up with the `ping starting` : `:00,843` vs `:00,782`

```
[2017-11-08T15:01:00,782][TRACE][o.e.d.z.ZenDiscovery     ] [node-106] starting to ping
[2017-11-08T15:01:00,782][TRACE][o.e.d.z.ZenDiscovery     ] [node-107] starting to ping
```

Then another ping is sent and 3 seconds later the reply does not list 108 anymore and a new master is elected.
```
[2017-11-08T15:01:03,798][TRACE][o.e.d.z.ZenDiscovery     ] [node-106] starting to ping
[2017-11-08T15:01:06,799][TRACE][o.e.d.z.ZenDiscovery     ] [node-106] full ping responses:
...
        --> ping_response{node [{node-106}{668jPO1YSweDaqRyQxMRwA}{LwYDV2eBQAC3ialrV4qEbQ}{node106.home.lan}{10.10.10.13:9300}], id[39], master [null],cluster_state_version [420], cluster_name[my-cluster]}
[2017-11-08T15:01:06,800][TRACE][o.e.d.z.ZenDiscovery     ] [node-106] candidate Candidate{node={node-106}{668jPO1YSweDaqRyQxMRwA}{LwYDV2eBQAC3ialrV4qEbQ}{node106.home.lan}{10.10.10.13:9300}, clusterStateVersion=420} won election
[2017-11-08T15:01:06,801][DEBUG][o.e.d.z.ZenDiscovery     ] [node-106] elected as master, waiting for incoming joins ([1] needed)
[2017-11-08T15:01:06,822][INFO ][o.e.c.s.ClusterService   ] [node-106] new_master {node-106}{668jPO1YSweDaqRyQxMRwA}{LwYDV2eBQAC3ialrV4qEbQ}{node106.home.lan}{10.10.10.13:9300}, reason: zen-disco-elected-as-master ([5] nodes joined)[{node-113}{1QzSsxO8ToqgUtfYycgmlw}{NUnBCQ_eSeSZjQV5PE1Jtg}{node113.home.lan}{10.10.10.20:9300}, {node-104}{59gFtGGvQGGy0LLVMIAgig}{Eg5gYe77SYuXB6vZfKKicg}{node104.home.lan}{10.10.10.214:9300}, {node-107}{_-P9koqwQAaBDpPyoflXbw}{j6BbufsTQq6kU-TFbAZW5Q}{node107.home.lan}{10.10.10.14:9300}, {node-105}{kt2uxSxNTDe9JhRuuWORYA}{VFf4A4bfTw6WoagiH6lpTw}{node105.home.lan}{10.10.10.215:9300}, {node-103}{TPSttSagTZq2VCDUhX5cGw}{puBEOLAARtmigK1y0lH5yw}{node103.home.lan}{10.10.10.213:9300}]
```

Had a quick chat with @ywelsch and it looks like a node in shutdown would still reply to a discovery ping request if the shutdown is not yet finished. 
Since such a node will be down shortly, it should not reply to a discovery ping. 
It could also speed up master election. I.e. in the above example an extra cycle of 3 seconds was added because the old master was still listed. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nodes in process of shutting down should not respond to discovery pings #27328

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Nodes in process of shutting down should not respond to discovery pings #27328

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions