Skip to content

Nodes are always suspect in Statefulset #17132

@JorritSalverda

Description

@JorritSalverda

BUG REPORT

I run CockroachDB version 1.0.3 in a StatefulSet on Container engine 1.7.1. The underlying node pool uses preemptible nodes so the CockroachDB pods get relocated at least every 24 hours. A nice disruption test for CockroachDB. To make sure the preemptible nodes don't all die at the same moment I randomly kill them between 12 and 24 hours.

When first creating the cluster all nodes show up as green / healthy, but after being re-located to different hosts they show up as orange / suspect. The cluster seems to work fine nonetheless.

Could this be related to StatefulSet nodes getting a new ip every time they get recreated? Or the uptime being less than a certain value?

2017-07-20 00:37:47 | WARNING | RUNNING IN INSECURE MODE! | cli/start.go:587
-- | -- | -- | --
2017-07-20 00:37:47 | INFO | line format: [IWEF]yymmdd hh:mm:ss.uuuuuu goid file:line msg utf8=✓ | util/log/clog.go:1011
2017-07-20 00:37:47 | INFO | [config] arguments: [/cockroach/cockroach start --logtostderr --insecure --host cockroachdb-0.cockroachdb.development.svc.cluster.local --http-host 0.0.0.0 --join cockroachdb-public] | util/log/clog.go:1011
2017-07-20 00:37:47 | INFO | [config] binary: CockroachDB CCL v1.0.3 (linux amd64, built 2017/07/06 17:46:06, go1.8.3) | util/log/clog.go:1011
2017-07-20 00:37:47 | INFO | [config] running on machine: cockroachdb-0 | util/log/clog.go:1011
2017-07-20 00:37:47 | INFO | [config] file created at: 2017/07/20 00:37:47 | util/log/clog.go:1011
2017-07-20 00:37:47 | INFO | CockroachDB CCL v1.0.3 (linux amd64, built 2017/07/06 17:46:06, go1.8.3) | cli/start.go:593
2017-07-20 00:37:48 | INFO | system total memory: 2.0 GiB | server/config.go:375
2017-07-20 03:20:50 | WARNING | RUNNING IN INSECURE MODE! | cli/start.go:587
-- | -- | -- | --
2017-07-20 03:20:50 | INFO | line format: [IWEF]yymmdd hh:mm:ss.uuuuuu goid file:line msg utf8=✓ | util/log/clog.go:1011
2017-07-20 03:20:50 | INFO | [config] arguments: [/cockroach/cockroach start --logtostderr --insecure --host cockroachdb-1.cockroachdb.development.svc.cluster.local --http-host 0.0.0.0 --join cockroachdb-public] | util/log/clog.go:1011
2017-07-20 03:20:50 | INFO | [config] binary: CockroachDB CCL v1.0.3 (linux amd64, built 2017/07/06 17:46:06, go1.8.3) | util/log/clog.go:1011
2017-07-20 03:20:50 | INFO | [config] running on machine: cockroachdb-1 | util/log/clog.go:1011
2017-07-20 03:20:50 | INFO | [config] file created at: 2017/07/20 03:20:50 | util/log/clog.go:1011
2017-07-20 03:20:50 | INFO | CockroachDB CCL v1.0.3 (linux amd64, built 2017/07/06 17:46:06, go1.8.3) | cli/start.go:593
2017-07-20 03:20:50 | INFO | system total memory: 2.0 GiB | server/config.go:375
2017-07-19 14:38:55 | WARNING | RUNNING IN INSECURE MODE! | cli/start.go:587
-- | -- | -- | --
2017-07-19 14:38:55 | INFO | line format: [IWEF]yymmdd hh:mm:ss.uuuuuu goid file:line msg utf8=✓ | util/log/clog.go:1011
2017-07-19 14:38:55 | INFO | [config] arguments: [/cockroach/cockroach start --logtostderr --insecure --host cockroachdb-2.cockroachdb.development.svc.cluster.local --http-host 0.0.0.0 --join cockroachdb-public] | util/log/clog.go:1011
2017-07-19 14:38:55 | INFO | [config] binary: CockroachDB CCL v1.0.3 (linux amd64, built 2017/07/06 17:46:06, go1.8.3) | util/log/clog.go:1011
2017-07-19 14:38:55 | INFO | [config] running on machine: cockroachdb-2 | util/log/clog.go:1011
2017-07-19 14:38:55 | INFO | [config] file created at: 2017/07/19 14:38:55 | util/log/clog.go:1011
2017-07-19 14:38:55 | INFO | CockroachDB CCL v1.0.3 (linux amd64, built 2017/07/06 17:46:06, go1.8.3) | cli/start.go:593
2017-07-19 14:38:55 | INFO | system total memory: 2.0 GiB | server/config.go:375
2017-07-19 22:31:50 | WARNING | RUNNING IN INSECURE MODE! | cli/start.go:587
-- | -- | -- | --
2017-07-19 22:31:50 | INFO | line format: [IWEF]yymmdd hh:mm:ss.uuuuuu goid file:line msg utf8=✓ | util/log/clog.go:1011
2017-07-19 22:31:50 | INFO | [config] arguments: [/cockroach/cockroach start --logtostderr --insecure --host cockroachdb-3.cockroachdb.development.svc.cluster.local --http-host 0.0.0.0 --join cockroachdb-public] | util/log/clog.go:1011
2017-07-19 22:31:50 | INFO | [config] binary: CockroachDB CCL v1.0.3 (linux amd64, built 2017/07/06 17:46:06, go1.8.3) | util/log/clog.go:1011
2017-07-19 22:31:50 | INFO | [config] running on machine: cockroachdb-3 | util/log/clog.go:1011
2017-07-19 22:31:50 | INFO | [config] file created at: 2017/07/19 22:31:50 | util/log/clog.go:1011
2017-07-19 22:31:50 | INFO | CockroachDB CCL v1.0.3 (linux amd64, built 2017/07/06 17:46:06, go1.8.3) | cli/start.go:593
2017-07-19 22:31:50 | INFO | system total memory: 2.0 GiB | server/config.go:375
2017-07-20 03:19:17 | WARNING | RUNNING IN INSECURE MODE! | cli/start.go:587
-- | -- | -- | --
2017-07-20 03:19:17 | INFO | line format: [IWEF]yymmdd hh:mm:ss.uuuuuu goid file:line msg utf8=✓ | util/log/clog.go:1011
2017-07-20 03:19:17 | INFO | [config] arguments: [/cockroach/cockroach start --logtostderr --insecure --host cockroachdb-4.cockroachdb.development.svc.cluster.local --http-host 0.0.0.0 --join cockroachdb-public] | util/log/clog.go:1011
2017-07-20 03:19:17 | INFO | [config] binary: CockroachDB CCL v1.0.3 (linux amd64, built 2017/07/06 17:46:06, go1.8.3) | util/log/clog.go:1011
2017-07-20 03:19:17 | INFO | [config] running on machine: cockroachdb-4 | util/log/clog.go:1011
2017-07-20 03:19:17 | INFO | [config] file created at: 2017/07/20 03:19:17 | util/log/clog.go:1011
2017-07-20 03:19:17 | INFO | CockroachDB CCL v1.0.3 (linux amd64, built 2017/07/06 17:46:06, go1.8.3) | cli/start.go:593
2017-07-20 03:19:17 | INFO | system total memory: 2.0 GiB | server/config.go:375
  • What did you do?
  1. Create a CockroachDB cluster using https://github.com/cockroachdb/cockroach/blob/v1.0.3/cloud/kubernetes/cockroachdb-statefulset.yaml
  2. Let it run on top of a preemptible node pool in Google Container Engine
  3. Wait at least 24 hours
  • What did you expect to see?

Re-located nodes showing up as green / healthy

  • What did you see instead?

Re-located nodes showing up as orange / suspect

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions