-
Notifications
You must be signed in to change notification settings - Fork 4.1k
storage: node does not become live after restart until many hours later #28179
Copy link
Copy link
Closed
Labels
A-kv-clientRelating to the KV client and the KV interface.Relating to the KV client and the KV interface.C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.O-communityOriginated from the communityOriginated from the communityS-2-temp-unavailabilityTemp crashes or other availability problems. Can be worked around or resolved by restarting.Temp crashes or other availability problems. Can be worked around or resolved by restarting.
Milestone
Description
BUG REPORT
Please describe the issue you observed, and any steps we can take to reproduce it:
-
Which version of CockroachDB are you using?
2.0.4 -
What did you do?
Rebooted a node -
What did you expect to see?
Node to come up -
What did you see instead?
cockroachdb starts but node never becomes live -
What was the impact?
Minor, rest of the cluster is live
I'm seeing lots of messages like this:
W180801 21:55:27.492922 343 storage/node_liveness.go:501 [n17,hb] slow heartbeat took 4.5s
W180801 21:55:27.492930 343 storage/node_liveness.go:438 [n17,hb] failed node liveness heartbeat: context deadline exceeded
W180801 21:55:28.035125 344 sql/jobs/registry.go:300 canceling all jobs due to liveness failure
NTP sync and network connectivity seems good.
Log file for this node:
cockroach.log
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
A-kv-clientRelating to the KV client and the KV interface.Relating to the KV client and the KV interface.C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.O-communityOriginated from the communityOriginated from the communityS-2-temp-unavailabilityTemp crashes or other availability problems. Can be worked around or resolved by restarting.Temp crashes or other availability problems. Can be worked around or resolved by restarting.