Skip to content

sql,kv,storage: push column batch generation into kvserver #82323

@ajwerner

Description

@ajwerner

23.1 must-haves:

  • introduce local fastpath
  • figure out whether we want to support Get requests
  • figure out what to do with tracing (i.e. TraceKV flag of cFetcher)
  • what exactly do we want to show for KV Bytes Read statistic?

23.1 nice-to-haves:

Later:


Is your feature request related to a problem? Please describe.
One known bottleneck for cockroach performance is so-called "scan speed". In practice, this is the speed to scan data off of disk, encode it into the scan response, decode it, then re-encode it into a columnar format. The columnar format is now used extensively in execution. The above summary is misleading in a dedicated cluster: often the query execution happens in the same process as the kvserver, so the encoding and decoding step can be skipped. In multi-tenant deployments, the data must be transmitted over the network back to the server. This can be particularly costly when the data is being served from a separate availability zone ([1], #71887). The above proposal has the potential to improve the speed by 1) not decoding columns we don't need and 2) creating much smaller responses.

Any eventual movement towards columnarization at the storage layer will need to have a corresponding read API. This issue posits that we should build the columnar read API first to gain experience.

Describe the solution you'd like

We should make an apache arrow batch response format which does column projection based on the IndexFetchSpec.

Additional context

Relates very closely to if not just adds exposition to #71887.

@jordanlewis made a prototype here: #52863. At the time it showed a ~5% win in TPCH performance.

@RaduBerinde put in a ton of work to clean up how we specify the data to be fetched. Now there exists a small protobuf which could conceivably be transmitted with the scan request and used to describe how to decode the data.


[1] We're probably going to do #72593 to attack the cross-AZ network cost problem.

Jira issue: CRDB-16284
Epic: CRDB-26388

Metadata

Metadata

Assignees

Labels

C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)T-sql-queriesSQL Queries Teammeta-issueContains a list of several other issues.

Type

No type

Projects

Status

Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions