-
Notifications
You must be signed in to change notification settings - Fork 4.1k
sql,kv,storage: push column batch generation into kvserver #82323
Description
23.1 must-haves:
- introduce local fastpath
- figure out whether we want to support
Getrequests - figure out what to do with tracing (i.e.
TraceKVflag ofcFetcher) - what exactly do we want to show for
KV Bytes Readstatistic?
23.1 nice-to-haves:
- support index joins (sql,kv: power ColIndexJoin by COL_BATCH_RESPONSE scan format #94807)
- propagate
estimatedRowCountas the hint forcFetcherWrapper(sql, kv: propagate and utilize estimated row count hint for KV projection pushdown work #94850) - support filter pushdown
- sql: columnar direct scan results in 10x more messages #99838
Later:
- kv: support SKIP LOCKED and different lock strengths with COL_BATCH_RESPONSE scan format #92950
- sql, kv: support non-enum user-defined types with COL_BATCH_RESPONSE scan format #92954
Is your feature request related to a problem? Please describe.
One known bottleneck for cockroach performance is so-called "scan speed". In practice, this is the speed to scan data off of disk, encode it into the scan response, decode it, then re-encode it into a columnar format. The columnar format is now used extensively in execution. The above summary is misleading in a dedicated cluster: often the query execution happens in the same process as the kvserver, so the encoding and decoding step can be skipped. In multi-tenant deployments, the data must be transmitted over the network back to the server. This can be particularly costly when the data is being served from a separate availability zone ([1], #71887). The above proposal has the potential to improve the speed by 1) not decoding columns we don't need and 2) creating much smaller responses.
Any eventual movement towards columnarization at the storage layer will need to have a corresponding read API. This issue posits that we should build the columnar read API first to gain experience.
Describe the solution you'd like
We should make an apache arrow batch response format which does column projection based on the IndexFetchSpec.
Additional context
Relates very closely to if not just adds exposition to #71887.
@jordanlewis made a prototype here: #52863. At the time it showed a ~5% win in TPCH performance.
@RaduBerinde put in a ton of work to clean up how we specify the data to be fetched. Now there exists a small protobuf which could conceivably be transmitted with the scan request and used to describe how to decode the data.
- rowenc: Christmas cleanup #74357
- sql: clean up mutable not-null columns hack #74922
- catalog: add Index.InvertedColumnKeyType #75427
- sql: directly specify columns in TableReader #75114
- row: fetcher cleanup and improvements #75261
- sql: introduce proto with metadata needed by fetchers, use it in row.Fetcher #75633
- colfetcher: use IndexFetchSpec #75767
- sql: use IndexFetchSpec in TableReader #76394
- sql: remove public key column check when fetching #76788
- rowenc: various improvements to IndexFetchSpec initialization #76795
- span: use KeyColumns in span.Builder #76836
- sql: use IndexFetchSpec in JoinReader #76963
- sql: introduce special type for inverted index keys #77501
- sql: use IndexFetchSpec for inverted joiner #77875
- sql: use IndexFetchSpec for zigzag join #78295
[1] We're probably going to do #72593 to attack the cross-AZ network cost problem.
Jira issue: CRDB-16284
Epic: CRDB-26388
Metadata
Metadata
Assignees
Labels
Type
Projects
Status