Skip to content

sql/kv: better story for fast churn / repaving of nodes wrt node IDs #47470

@knz

Description

@knz

There are users out there who think it's a good idea to regularly drop a node and create a new node to replace it. It's also called "churning" or "repaving".

It seems OK at the surface (it fits nicely in the CockroachDB vision) but there's a technical gotcha: every new node gets a fresh node ID.

The regular addition of new node IDs in the system has not been designed for nor tested thoroughly. (Conversely, certain parts of CockroachDB assume that node IDs remain "small" even though the data type is 32 bit)

We need to create a list of features / mechanisms that are reliant on node ID and audit them for resource consumption or correctness issues that grow with the total number of node IDs issued:

  • unique_rowid() shares some bits of timestamp with some bits of the node ID. Beyond a certain value, we lose the pseudo time-ordering of the rowid column. (This seems like a major gotcha.)
  • memory and disk usage (think: node descriptors that linger)
  • network traffic (think: gossip)
  • keying of certain data structures (think: caches, lookup accelerators, etc)
  • generation of certain IDs (txn IDs, SQL session IDs, query cancellation IDs, etc)

Once we have this list, we need to create dedicated tests that inject large numbers of node IDs and exercise these items to verify that they behave in a way that's reasonable and expected.

For context a 30 node cluster repaved every day (e.g. to follow OS upgrades) means that we get node ID 2^15 after just 3 years. We don't want a user/customer knocking at our door saying “I have this 3-year old deployment that was just fine a minute ago, and today all my traffic has stopped.”

Jira issue: CRDB-4408

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-kv-decom-rolling-restartDecommission and Rolling RestartsC-investigationFurther steps needed to qualify. C-label will change.T-kvKV Team

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions