Skip to content

storage: node does not become live after restart until many hours later #28179

@jethrogb

Description

@jethrogb

BUG REPORT

Please describe the issue you observed, and any steps we can take to reproduce it:

  • Which version of CockroachDB are you using?
    2.0.4

  • What did you do?
    Rebooted a node

  • What did you expect to see?
    Node to come up

  • What did you see instead?
    cockroachdb starts but node never becomes live

  • What was the impact?
    Minor, rest of the cluster is live

I'm seeing lots of messages like this:

W180801 21:55:27.492922 343 storage/node_liveness.go:501  [n17,hb] slow heartbeat took 4.5s
W180801 21:55:27.492930 343 storage/node_liveness.go:438  [n17,hb] failed node liveness heartbeat: context deadline exceeded
W180801 21:55:28.035125 344 sql/jobs/registry.go:300  canceling all jobs due to liveness failure

NTP sync and network connectivity seems good.

Log file for this node:
cockroach.log

Metadata

Metadata

Labels

A-kv-clientRelating to the KV client and the KV interface.C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.O-communityOriginated from the communityS-2-temp-unavailabilityTemp crashes or other availability problems. Can be worked around or resolved by restarting.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions