-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Nodes are always suspect in Statefulset #17132
Description
BUG REPORT
I run CockroachDB version 1.0.3 in a StatefulSet on Container engine 1.7.1. The underlying node pool uses preemptible nodes so the CockroachDB pods get relocated at least every 24 hours. A nice disruption test for CockroachDB. To make sure the preemptible nodes don't all die at the same moment I randomly kill them between 12 and 24 hours.
When first creating the cluster all nodes show up as green / healthy, but after being re-located to different hosts they show up as orange / suspect. The cluster seems to work fine nonetheless.
Could this be related to StatefulSet nodes getting a new ip every time they get recreated? Or the uptime being less than a certain value?
2017-07-20 00:37:47 | WARNING | RUNNING IN INSECURE MODE! | cli/start.go:587
-- | -- | -- | --
2017-07-20 00:37:47 | INFO | line format: [IWEF]yymmdd hh:mm:ss.uuuuuu goid file:line msg utf8=✓ | util/log/clog.go:1011
2017-07-20 00:37:47 | INFO | [config] arguments: [/cockroach/cockroach start --logtostderr --insecure --host cockroachdb-0.cockroachdb.development.svc.cluster.local --http-host 0.0.0.0 --join cockroachdb-public] | util/log/clog.go:1011
2017-07-20 00:37:47 | INFO | [config] binary: CockroachDB CCL v1.0.3 (linux amd64, built 2017/07/06 17:46:06, go1.8.3) | util/log/clog.go:1011
2017-07-20 00:37:47 | INFO | [config] running on machine: cockroachdb-0 | util/log/clog.go:1011
2017-07-20 00:37:47 | INFO | [config] file created at: 2017/07/20 00:37:47 | util/log/clog.go:1011
2017-07-20 00:37:47 | INFO | CockroachDB CCL v1.0.3 (linux amd64, built 2017/07/06 17:46:06, go1.8.3) | cli/start.go:593
2017-07-20 00:37:48 | INFO | system total memory: 2.0 GiB | server/config.go:375
2017-07-20 03:20:50 | WARNING | RUNNING IN INSECURE MODE! | cli/start.go:587
-- | -- | -- | --
2017-07-20 03:20:50 | INFO | line format: [IWEF]yymmdd hh:mm:ss.uuuuuu goid file:line msg utf8=✓ | util/log/clog.go:1011
2017-07-20 03:20:50 | INFO | [config] arguments: [/cockroach/cockroach start --logtostderr --insecure --host cockroachdb-1.cockroachdb.development.svc.cluster.local --http-host 0.0.0.0 --join cockroachdb-public] | util/log/clog.go:1011
2017-07-20 03:20:50 | INFO | [config] binary: CockroachDB CCL v1.0.3 (linux amd64, built 2017/07/06 17:46:06, go1.8.3) | util/log/clog.go:1011
2017-07-20 03:20:50 | INFO | [config] running on machine: cockroachdb-1 | util/log/clog.go:1011
2017-07-20 03:20:50 | INFO | [config] file created at: 2017/07/20 03:20:50 | util/log/clog.go:1011
2017-07-20 03:20:50 | INFO | CockroachDB CCL v1.0.3 (linux amd64, built 2017/07/06 17:46:06, go1.8.3) | cli/start.go:593
2017-07-20 03:20:50 | INFO | system total memory: 2.0 GiB | server/config.go:375
2017-07-19 14:38:55 | WARNING | RUNNING IN INSECURE MODE! | cli/start.go:587
-- | -- | -- | --
2017-07-19 14:38:55 | INFO | line format: [IWEF]yymmdd hh:mm:ss.uuuuuu goid file:line msg utf8=✓ | util/log/clog.go:1011
2017-07-19 14:38:55 | INFO | [config] arguments: [/cockroach/cockroach start --logtostderr --insecure --host cockroachdb-2.cockroachdb.development.svc.cluster.local --http-host 0.0.0.0 --join cockroachdb-public] | util/log/clog.go:1011
2017-07-19 14:38:55 | INFO | [config] binary: CockroachDB CCL v1.0.3 (linux amd64, built 2017/07/06 17:46:06, go1.8.3) | util/log/clog.go:1011
2017-07-19 14:38:55 | INFO | [config] running on machine: cockroachdb-2 | util/log/clog.go:1011
2017-07-19 14:38:55 | INFO | [config] file created at: 2017/07/19 14:38:55 | util/log/clog.go:1011
2017-07-19 14:38:55 | INFO | CockroachDB CCL v1.0.3 (linux amd64, built 2017/07/06 17:46:06, go1.8.3) | cli/start.go:593
2017-07-19 14:38:55 | INFO | system total memory: 2.0 GiB | server/config.go:375
2017-07-19 22:31:50 | WARNING | RUNNING IN INSECURE MODE! | cli/start.go:587
-- | -- | -- | --
2017-07-19 22:31:50 | INFO | line format: [IWEF]yymmdd hh:mm:ss.uuuuuu goid file:line msg utf8=✓ | util/log/clog.go:1011
2017-07-19 22:31:50 | INFO | [config] arguments: [/cockroach/cockroach start --logtostderr --insecure --host cockroachdb-3.cockroachdb.development.svc.cluster.local --http-host 0.0.0.0 --join cockroachdb-public] | util/log/clog.go:1011
2017-07-19 22:31:50 | INFO | [config] binary: CockroachDB CCL v1.0.3 (linux amd64, built 2017/07/06 17:46:06, go1.8.3) | util/log/clog.go:1011
2017-07-19 22:31:50 | INFO | [config] running on machine: cockroachdb-3 | util/log/clog.go:1011
2017-07-19 22:31:50 | INFO | [config] file created at: 2017/07/19 22:31:50 | util/log/clog.go:1011
2017-07-19 22:31:50 | INFO | CockroachDB CCL v1.0.3 (linux amd64, built 2017/07/06 17:46:06, go1.8.3) | cli/start.go:593
2017-07-19 22:31:50 | INFO | system total memory: 2.0 GiB | server/config.go:375
2017-07-20 03:19:17 | WARNING | RUNNING IN INSECURE MODE! | cli/start.go:587
-- | -- | -- | --
2017-07-20 03:19:17 | INFO | line format: [IWEF]yymmdd hh:mm:ss.uuuuuu goid file:line msg utf8=✓ | util/log/clog.go:1011
2017-07-20 03:19:17 | INFO | [config] arguments: [/cockroach/cockroach start --logtostderr --insecure --host cockroachdb-4.cockroachdb.development.svc.cluster.local --http-host 0.0.0.0 --join cockroachdb-public] | util/log/clog.go:1011
2017-07-20 03:19:17 | INFO | [config] binary: CockroachDB CCL v1.0.3 (linux amd64, built 2017/07/06 17:46:06, go1.8.3) | util/log/clog.go:1011
2017-07-20 03:19:17 | INFO | [config] running on machine: cockroachdb-4 | util/log/clog.go:1011
2017-07-20 03:19:17 | INFO | [config] file created at: 2017/07/20 03:19:17 | util/log/clog.go:1011
2017-07-20 03:19:17 | INFO | CockroachDB CCL v1.0.3 (linux amd64, built 2017/07/06 17:46:06, go1.8.3) | cli/start.go:593
2017-07-20 03:19:17 | INFO | system total memory: 2.0 GiB | server/config.go:375
- What did you do?
- Create a CockroachDB cluster using https://github.com/cockroachdb/cockroach/blob/v1.0.3/cloud/kubernetes/cockroachdb-statefulset.yaml
- Let it run on top of a preemptible node pool in Google Container Engine
- Wait at least 24 hours
- What did you expect to see?
Re-located nodes showing up as green / healthy
- What did you see instead?
Re-located nodes showing up as orange / suspect