sql,kv,storage: push column batch generation into kvserver

23.1 must-haves:
- [x] introduce local fastpath
- [ ] figure out whether we want to support `Get` requests
- [ ] figure out what to do with tracing (i.e. `TraceKV` flag of `cFetcher`)
- [ ] what exactly do we want to show for `KV Bytes Read` statistic?

23.1 nice-to-haves:
- [ ] support index joins (#94807)
- [ ] propagate `estimatedRowCount` as the hint for `cFetcherWrapper` (#94850)
- [ ] support filter pushdown
- [ ] #99838

Later:
- [ ] #92950
- [ ] #92954

----

**Is your feature request related to a problem? Please describe.**
One known bottleneck for cockroach performance is so-called "scan speed". In practice, this is the speed to scan data off of disk, encode it into the scan response, decode it, then re-encode it into a columnar format. The columnar format is now used extensively in execution. The above summary is misleading in a dedicated cluster: often the query execution happens in the same process as the kvserver, so the encoding and decoding step can be skipped. In multi-tenant deployments, the data must be transmitted over the network back to the server. This can be particularly costly when the data is being served from a separate availability zone ([1], https://github.com/cockroachdb/cockroach/issues/71887). The above proposal has the potential to improve the speed by 1) not decoding columns we don't need and 2) creating much smaller responses.

Any eventual movement towards columnarization at the storage layer will need to have a corresponding read API. This issue posits that we should build the columnar read API first to gain experience.


**Describe the solution you'd like**

We should make an apache arrow batch response format which does column projection based on the [`IndexFetchSpec`](https://github.com/cockroachdb/cockroach/blob/f43648aeea968840c3ea9932eb8e3e13f45140c5/pkg/sql/catalog/descpb/index_fetch.proto#L21-L134).

**Additional context**

Relates very closely to if not just adds exposition to https://github.com/cockroachdb/cockroach/issues/71887. 

@jordanlewis made a prototype here: https://github.com/cockroachdb/cockroach/pull/52863. At the time it showed a ~5% win in TPCH performance. 

@RaduBerinde put in a ton of work to clean up how we specify the data to be fetched. Now there exists a small protobuf which could conceivably be transmitted with the scan request and used to describe how to decode the data.
 * https://github.com/cockroachdb/cockroach/pull/74357
 * https://github.com/cockroachdb/cockroach/pull/74922
 * https://github.com/cockroachdb/cockroach/pull/75427
 * https://github.com/cockroachdb/cockroach/pull/75114
 * https://github.com/cockroachdb/cockroach/pull/75261
 * https://github.com/cockroachdb/cockroach/pull/75633
 * https://github.com/cockroachdb/cockroach/pull/75767
 * https://github.com/cockroachdb/cockroach/pull/76394
 * https://github.com/cockroachdb/cockroach/pull/76788
 * https://github.com/cockroachdb/cockroach/pull/76795
 * https://github.com/cockroachdb/cockroach/pull/76836
 * https://github.com/cockroachdb/cockroach/pull/76963
 * https://github.com/cockroachdb/cockroach/pull/77501
 * https://github.com/cockroachdb/cockroach/pull/77875
 * https://github.com/cockroachdb/cockroach/pull/78295

----

[1] We're probably going to do https://github.com/cockroachdb/cockroach/issues/72593 to attack the cross-AZ network cost problem. 

Jira issue: CRDB-16284
Epic: CRDB-26388

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql,kv,storage: push column batch generation into kvserver #82323

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

sql,kv,storage: push column batch generation into kvserver #82323

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions