sql: wide rows handled very inefficiently in multi-tenant configuration

This is a write-up of an issue that many of us are aware of but it's not explicitly written down anywhere AFAICT.

DistSQL was introduced to solve two major problems:
 1. distributing large queries to use the resources across multiple machines
 2. reduce the amount of data that needs to be transferred out of the leaseholder node.

DistSQL is currently disabled for multi-tenant. In most of the related discussions related to addressing that, the focus seems to be on the first aspect. However, I believe that the second aspect is more important.

For example, on Serverless `SELECT COUNT(*) FROM t` has to transfer the *entire table* between the host cluster and the tenant. It's the same for a simple aggregation like `SELECT SUM(x) FROM t`. In a traditional deployment, DistSQL solved these issues nicely - we only transfer out a single value from each node that stores a piece of the table.

We have seen various instances where users have wide tables (many columns), so this is a no-go for many usecases.

There are two directions for attacking this issue:
 - add more intelligence to KV. The most important aspect would be projecting only a subset of columns. A specific proposal (@jordanlewis prototyped this idea) is to move the logic for building columnar batches into KV, which seems very promising (especially in terms of benefit vs effort ratio).
 - run some subset of DistSQL on the host cluster. A more specific idea is to have "rental" DistSQL pods available on the host cluster side that a tenant can make temporary use of.

CC @ajstorm 

Jira issue: CRDB-10815

Epic: CRDB-14837

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: wide rows handled very inefficiently in multi-tenant configuration #71887

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

sql: wide rows handled very inefficiently in multi-tenant configuration #71887

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions