Skip to content

core: Don't let nodes from one cluster interfere with another #15801

@a-robinson

Description

@a-robinson

We do a good job in the gossip layer of not letting nodes from one cluster join another, but not at any other layers of the stack. If one of the clusters, cluster X, used to have at least one node at the address of one of the nodes in cluster Y, then bad things can happen because the nodes in X will talk to that address in ways that can mess with cluster Y. One instance of this problem was seen in / explained by my comment on #15591 (comment):

  • At some point before I got involved, someone had set up a cross-cloud cluster in clouds A and B. After their testing was done, they took down the nodes in A but left up the nodes in B.
  • Later, while I was around, we brought up a new cluster in node A with the exact same IP addresses.
  • Although the nodes in B from the old cluster couldn't properly join the new cluster via gossip due to the cluster ID check, they could still talk to the nodes in A because they were at the same IP address. In practice, it looks like the old nodes leftover in B do two main things:
    • Open raft transport streams and send messages to the new nodes in A. If the node/store ID of the node in A doesn't match the expectation of the node in B, all that happens is that requests get rejected. However, if the ID does match, then bad things can happen, like the crash in the PDF I attached above in which one of the new nodes crashed because it didn't have a raft group in place for the request.
    • Try to update the node liveness table. This is presumably why the IPs on the node list page of the UI would change every time we refreshed it -- sometimes the new node would have most recently updated the liveness and sometimes the old node would have.

We can wait and see whether or not anyone else runs into something like this for the sake of prioritizing it, since I don't think it's likely to be common, but it will be pretty confusing for anyone that it does happen to.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions