Skip to content

stability: partitioned gossip network caused by ping-ponging of r1's lease #24753

@vivekmenezes

Description

@vivekmenezes

When attempting to make an alpha release all was going fine until I restarted node 1 on adriatic. On node 1 coming up the node status page

https://cockroach-adriatic-0003.crdb.io:8080/_status/nodes

shows some node status with some nodes reporting

 "liveness.epochincrements": 0,
        "liveness.heartbeatfailures": 84,
        "liveness.heartbeatlatency-max": 4718591,
        "liveness.heartbeatlatency-p50": 3145727,
        "liveness.heartbeatlatency-p75": 4194303,
        "liveness.heartbeatlatency-p90": 4194303,
        "liveness.heartbeatlatency-p99": 4718591,
        "liveness.heartbeatlatency-p99.9": 4718591,
        "liveness.heartbeatlatency-p99.99": 4718591,
        "liveness.heartbeatlatency-p99.999": 4718591,
        "liveness.heartbeatsuccesses": 1103,
        "liveness.livenodes": 1,

see livenodes=1

and others reporting:

 "liveness.epochincrements": 0,
        "liveness.heartbeatfailures": 63,
        "liveness.heartbeatlatency-max": 4980735,
        "liveness.heartbeatlatency-p50": 3407871,
        "liveness.heartbeatlatency-p75": 3538943,
        "liveness.heartbeatlatency-p90": 4063231,
        "liveness.heartbeatlatency-p99": 4980735,
        "liveness.heartbeatlatency-p99.9": 4980735,
        "liveness.heartbeatlatency-p99.99": 4980735,
        "liveness.heartbeatlatency-p99.999": 4980735,
        "liveness.heartbeatsuccesses": 45806,
        "liveness.livenodes": 6,

livenodes=6 correctly

I believe the admin UI uses this to report on liveness information

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions