Skip to content

gossip: tight loop of loopback connections opened to tighten network graph #28517

@tbg

Description

@tbg

Seen in the logs for #28179. This is a 15 node cluster and for this node (srv-8) at startup Gossip decided to tighten the network graph. The node has 14 join flags, but unfortunately one of them is itself (so it's missing some other node instead). I don't think that matters, but I noticed it.

So Gossip decides that the own node is too far away (already absurd) and tries to connect to it directly (also absurd). This happens in a tight loop for ~4 minutes and appears to block startup of the node (perhaps by not letting it connect to more useful peers; not sure).

I180801 21:35:13.775866 39 gossip/client.go:129  [n27] started gossip client to 10.64.2.17:26257
I180801 21:35:13.845016 39 gossip/client.go:134  [n27] closing client to node 13 (10.64.2.17:26257): stopping outgoing client to node 13 (10.64.2.17:26257); already have incoming
I180801 21:35:13.845140 119 gossip/gossip.go:1257  [n27] starting client to distant node 27 (7 > 5) to tighten network graph
I180801 21:35:13.845194 119 gossip/gossip.go:1307  [n27] node has connected to cluster via gossip
I180801 21:35:13.846048 126 gossip/client.go:129  [n27] started gossip client to 10.64.4.25:26257
I180801 21:35:13.846493 126 gossip/client.go:134  [n27] closing client to node 27 (10.64.4.25:26257): stopping outgoing client to node 27 (10.64.4.25:26257); loopback connection
I180801 21:35:13.846669 119 gossip/gossip.go:1257  [n27] starting client to distant node 27 (7 > 5) to tighten network graph
I180801 21:35:13.846734 128 gossip/client.go:129  [n27] started gossip client to 10.64.4.25:26257
I180801 21:35:13.847268 128 gossip/client.go:134  [n27] closing client to node 27 (10.64.4.25:26257): stopping outgoing client to node 27 (10.64.4.25:26257); loopback connection
I180801 21:35:13.847447 119 gossip/gossip.go:1257  [n27] starting client to distant node 27 (7 > 5) to tighten network graph
I180801 21:35:13.847562 470 gossip/client.go:129  [n27] started gossip client to 10.64.4.25:26257
I180801 21:35:13.848084 470 gossip/client.go:134  [n27] closing client to node 27 (10.64.4.25:26257): stopping outgoing client to node 27 (10.64.4.25:26257); loopback connection
I180801 21:35:13.848271 119 gossip/gossip.go:1257  [n27] starting client to distant node 27 (7 > 5) to tighten network graph
I180801 21:35:13.848401 285 gossip/client.go:129  [n27] started gossip client to 10.64.4.25:26257
I180801 21:35:13.849032 285 gossip/client.go:134  [n27] closing client to node 27 (10.64.4.25:26257): stopping outgoing client to node 27 (10.64.4.25:26257); loopback connection
I180801 21:35:13.849142 119 gossip/gossip.go:1257  [n27] starting client to distant node 27 (7 > 5) to tighten network graph
I180801 21:35:13.849245 138 gossip/client.go:129  [n27] started gossip client to 10.64.4.25:26257
I180801 21:35:13.849827 138 gossip/client.go:134  [n27] closing client to node 27 (10.64.4.25:26257): stopping outgoing client to node 27 (10.64.4.25:26257); loopback connection
I180801 21:35:13.849917 119 gossip/gossip.go:1257  [n27] starting client to distant node 27 (7 > 5) to tighten network graph
I180801 21:35:13.849980 719 gossip/client.go:129  [n27] started gossip client to 10.64.4.25:26257
I180801 21:35:13.850370 719 gossip/client.go:134  [n27] closing client to node 27 (10.64.4.25:26257): stopping outgoing client to node 27 (10.64.4.25:26257); loopback connection
I180801 21:35:13.850596 119 gossip/gossip.go:1257  [n27] starting client to distant node 27 (7 > 5) to tighten network graph
I180801 21:35:13.850670 569 gossip/client.go:129  [n27] started gossip client to 10.64.4.25:26257
I180801 21:35:13.851254 569 gossip/client.go:134  [n27] closing client to node 27 (10.64.4.25:26257): stopping outgoing client to node 27 (10.64.4.25:26257); loopback connection
I180801 21:35:13.851530 119 gossip/gossip.go:1257  [n27] starting client to distant node 27 (7 > 5) to tighten network graph
I180801 21:35:13.851604 1000 gossip/client.go:129  [n27] started gossip client to 10.64.4.25:26257
I180801 21:35:13.851987 1000 gossip/client.go:134  [n27] closing client to node 27 (10.64.4.25:26257): stopping outgoing client to node 27 (10.64.4.25:26257); loopback connection
I180801 21:35:13.852255 119 gossip/gossip.go:1257  [n27] starting client to distant node 27 (7 > 5) to tighten network graph
I180801 21:35:13.852354 947 gossip/client.go:129  [n27] started gossip client to 10.64.4.25:26257
I180801 21:35:13.852797 947 gossip/client.go:134  [n27] closing client to node 27 (10.64.4.25:26257): stopping outgoing client to node 27 (10.64.4.25:26257); loopback connection
I180801 21:35:13.852997 119 gossip/gossip.go:1257  [n27] starting client to distant node 27 (7 > 5) to tighten network graph
I180801 21:35:13.853057 1123 gossip/client.go:129  [n27] started gossip client to 10.64.4.25:26257
I180801 21:35:13.853497 1123 gossip/client.go:134  [n27] closing client to node 27 (10.64.4.25:26257): stopping outgoing client to node 27 (10.64.4.25:26257); loopback connection
I180801 21:35:13.853644 119 gossip/gossip.go:1257  [n27] starting client to distant node 27 (7 > 5) to tighten network graph
I180801 21:35:13.853743 252 gossip/client.go:129  [n27] started gossip client to 10.64.4.25:26257
I180801 21:35:13.854004 252 gossip/client.go:134  [n27] closing client to node 27 (10.64.4.25:26257): stopping outgoing client to node 27 (10.64.4.25:26257); loopback connection
I180801 21:35:13.854188 119 gossip/gossip.go:1257  [n27] starting client to distant node 27 (7 > 5) to tighten network graph
I180801 21:35:13.854244 876 gossip/client.go:129  [n27] started gossip client to 10.64.4.25:26257
I180801 21:35:13.854606 876 gossip/client.go:134  [n27] closing client to node 27 (10.64.4.25:26257): stopping outgoing client to node 27 (10.64.4.25:26257); loopback connection
I180801 21:35:13.854870 119 gossip/gossip.go:1257  [n27] starting client to distant node 27 (7 > 5) to tighten network graph
I180801 21:35:13.854951 1492 gossip/client.go:129  [n27] started gossip client to 10.64.4.25:26257
I180801 21:35:13.855389 1492 gossip/client.go:134  [n27] closing client to node 27 (10.64.4.25:26257): stopping outgoing client to node 27 (10.64.4.25:26257); loopback connection
I180801 21:35:13.855604 119 gossip/gossip.go:1257  [n27] starting client to distant node 27 (7 > 5) to tighten network graph
I180801 21:35:13.855660 1618 gossip/client.go:129  [n27] started gossip client to 10.64.4.25:26257
I180801 21:35:13.856032 1618 gossip/client.go:134  [n27] closing client to node 27 (10.64.4.25:26257): stopping outgoing client to node 27 (10.64.4.25:26257); loopback connection
I180801 21:35:13.856203 119 gossip/gossip.go:1257  [n27] starting client to distant node 27 (7 > 5) to tighten network graph
I180801 21:35:13.856262 1615 gossip/client.go:129  [n27] started gossip client to 10.64.4.25:26257

Metadata

Metadata

Assignees

Labels

A-kv-gossipC-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.S-3-ux-surpriseIssue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions