execbuilder: plan directly to distsql specs

**Remaining big items/questions**
- [x] how do we handle the closure of `planNode`s that are wrapped into the physical plans?
https://github.com/cockroachdb/cockroach/pull/50560#pullrequestreview-439571382 goes into some details with the difficulties, but the TL;DR is that some `planNode`s do not have equivalent processor specs, so we need to have a way to "embed" them into the physical plans (this is done via `wrapPlan` method which probably doesn't need any modifications) and clean up after them once the query is done (this is the part needs figuring out). With the old factory, that cleanup is done by `planTop.close` method which closes all trees in the query. However, with the new factory we don't have full trees - we might have a collection of random `planNode`s that might not be connected between each other.
- [ ] how do we handle subqueries? #51095
- [ ] how do we connect appropriate `planNode`s in the physical plan? #51095
- [ ] what is our testing story?
We probably will rely mostly on existing logic tests but also we need to add a differential testing harness before we're comfortable with removing the old factory. This is tracked in #50610.

----

Here is the list of all unimplemented methods that was added in #49348 that introduced the new factory (the ones that are checked either have been already merged or have a working work-in-progress commits). All methods are divided into two categories - one is methods in which we can construct the spec directly because there is a corresponding processor core, and another are `planNode`s that don't have an equivalent processor and might need to be wrapped (it is possible that the second category might be actually reduced to just a handful of items that require figuring out the necessary plumbing for handling the wrapped `planNode`s and all of the methods below will "just work" 🤞 ).

Can construct processor specs directly:
- [x] ConstructValues (this is checked but depends on the answer to the first question above)
- [x] ConstructScan (non virtual scan)
- [x] ConstructFilter
- [ ] ConstructInvertedFilter
- [x] ConstructSimpleProject
- [x] ConstructRender
- [x] ConstructHashJoin
- [x] ConstructMergeJoin
- [x] ConstructGroupBy
- [x] ConstructScalarGroupBy
- [x] ConstructDistinct
- [ ] ConstructSetOp
- [x] ConstructSort
- [ ] ConstructOrdinality
- [ ] ConstructIndexJoin
- [ ] ConstructLookupJoin (#74543)
- [ ] ConstructInvertedJoin
- [x] ConstructZigzagJoin
- [x] ConstructLimit
- [x] ConstructProjectSet
- [ ] ConstructWindow

Need to have wrapped `planNode`s:
- [x] ConstructValues
- [x] ConstructScan (virtual scan)
- [ ] ConstructApplyJoin
- [ ] ConstructMax1Row
- [ ] ConstructExplainOpt
- [x] ConstructExplain
- [ ] ConstructExplain (plan)
- [ ] ConstructShowTrace
- [ ] ConstructInsert
- [ ] ConstructInsertFastPath
- [ ] ConstructUpdate
- [ ] ConstructUpsert
- [ ] ConstructDelete
- [ ] ConstructDeleteRange
- [ ] ConstructCreateTable
- [ ] ConstructCreateView
- [ ] ConstructSequenceSelect
- [ ] ConstructSaveTable
- [ ] ConstructErrorIfRows
- [x] ConstructOpaque
- [ ] ConstructAlterTableSplit
- [ ] ConstructAlterTableUnsplit
- [ ] ConstructAlterTableUnsplitAll
- [ ] ConstructAlterTableRelocate
- [ ] ConstructBuffer
- [ ] ConstructScanBuffer
- [ ] ConstructRecursiveCTE
- [ ] ConstructControlJobs
- [ ] ConstructCancelQueries
- [ ] ConstructCancelSessions
- [ ] ConstructExport

Miscellaneous items:
- [x] RenameColumns
- [x] ConstructPlan
- [ ] populate index usage statistics in the DistSQL spec factory

----

**Background info**

Currently, the last stage in the optimizer is to use the `execbuilder.Builder` to construct a `planNode` tree from a `memo` expression tree. This `planNode` tree is subsequently converted to a collection of `ProcessorSpecs` in the distsql physical planner: https://github.com/cockroachdb/cockroach/blob/1e9f570afae702470a5e9893a6bf1a5818bdfb43/pkg/sql/distsql_physical_planner.go#L2293

We need to get rid of this redundant conversion and create `ProcessorSpecs` directly.

For 20.2, our focus should be on adding an off-by-default option to remove this redundant planning phase for the `kv --read-percent=100` workload. I envision this as creating a new implementation of the `exec.Factory` implementation that is swapped in when a cluster setting is set.

Supporting `kv --read-percent=100` is mostly a question of implementing `ConstructScan` and nothing else. However, this will probably teach us a lot of what we need to do to move over and how to do it. My hope is that once we get the `TableReaderSpec`s created directly, we'll have a concrete gameplan on moving over the rest of the processor spec creation.

Epic: CRDB-79

Jira issue: CRDB-4407

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

execbuilder: plan directly to distsql specs #47473

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

execbuilder: plan directly to distsql specs #47473

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions