every node for itself

Setup : 

```
5 similar nodes : 

btrainer-1.182  (192.168.1.182) (Current Master before incident) 
btrainer-1.186 (192.168.1.186)
btrainer-1.136  (192.168.1.136)
btrainer-13.137 (192.168.13.137)
btrainer-1.138  (192.168.1.138)
```

ES Configs : (version : 0.19.8)

```
cluster.name: btrainer
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: [ "192.168.1.182:10300", "192.168.1.186:10300", "192.168.1.136:10300", "192.168.13.137:10300", "192.168.1.138:10300" ]
http.port: 10200
index.number_of_replicas: 4
transport.tcp.port: 10300
```

Java Options : 

```
-Des-foreground=yes 
-Des.path.home=/elasticsearch 
-Xms4096m 
-Xmx20480m 
-Djline.enabled=true 
-XX:+UseParNewGC 
-XX:+UseConcMarkSweepGC 
-XX:+CMSParallelRemarkEnabled 
-XX:SurvivorRatio=8 
-XX:MaxTenuringThreshold=1 
-XX:CMSInitiatingOccupancyFraction=75 
-XX:+UseCMSInitiatingOccupancyOnly 
-cp /elasticsearch/lib/*:/elasticsearch/lib/sigar/* 
org.elasticsearch.bootstrap.ElasticSearch
```

Problem :

This problem repeats itself every 5-12 hours period. When everything running smoothly (cluster is green) 1 node goes down and everynode creates its own cluster (not 1/4 split, 1/1/1/1/1 split). The sample problem happened exactly at 22:06, we have a job checking cluster state every minute. This cluster mainly used for training so we have heavy traffic spikes on both reads and writes when jobs are triggered (also some continious small reads). 

1) What happened to btrainer-1.138 ?
2) Even if 1 node (btrainer-1.138) behaves irrationally why didn't the cluster split by 1/4; why did other nodes lose the master btrainer-1.182 ?

Logs :

you can check the logs from the nodes : https://gist.github.com/3510448


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

every node for itself #2215

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

every node for itself #2215

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions