-
Notifications
You must be signed in to change notification settings - Fork 4.1k
execbuilder: plan directly to distsql specs #47473
Description
Remaining big items/questions
- how do we handle the closure of
planNodes that are wrapped into the physical plans?
sql: add support for aggregations and values in the new factory #50560 (review) goes into some details with the difficulties, but the TL;DR is that someplanNodes do not have equivalent processor specs, so we need to have a way to "embed" them into the physical plans (this is done viawrapPlanmethod which probably doesn't need any modifications) and clean up after them once the query is done (this is the part needs figuring out). With the old factory, that cleanup is done byplanTop.closemethod which closes all trees in the query. However, with the new factory we don't have full trees - we might have a collection of randomplanNodes that might not be connected between each other. - how do we handle subqueries? sql: figure out how to handle subqueries by distSQLSpecExecFactory #51095
- how do we connect appropriate
planNodes in the physical plan? sql: figure out how to handle subqueries by distSQLSpecExecFactory #51095 - what is our testing story?
We probably will rely mostly on existing logic tests but also we need to add a differential testing harness before we're comfortable with removing the old factory. This is tracked in sql: implement differential testing of physical plans produced by the old and new factories #50610.
Here is the list of all unimplemented methods that was added in #49348 that introduced the new factory (the ones that are checked either have been already merged or have a working work-in-progress commits). All methods are divided into two categories - one is methods in which we can construct the spec directly because there is a corresponding processor core, and another are planNodes that don't have an equivalent processor and might need to be wrapped (it is possible that the second category might be actually reduced to just a handful of items that require figuring out the necessary plumbing for handling the wrapped planNodes and all of the methods below will "just work" 🤞 ).
Can construct processor specs directly:
- ConstructValues (this is checked but depends on the answer to the first question above)
- ConstructScan (non virtual scan)
- ConstructFilter
- ConstructInvertedFilter
- ConstructSimpleProject
- ConstructRender
- ConstructHashJoin
- ConstructMergeJoin
- ConstructGroupBy
- ConstructScalarGroupBy
- ConstructDistinct
- ConstructSetOp
- ConstructSort
- ConstructOrdinality
- ConstructIndexJoin
- ConstructLookupJoin (sql: distsql direct planning of lookup joins #74543)
- ConstructInvertedJoin
- ConstructZigzagJoin
- ConstructLimit
- ConstructProjectSet
- ConstructWindow
Need to have wrapped planNodes:
- ConstructValues
- ConstructScan (virtual scan)
- ConstructApplyJoin
- ConstructMax1Row
- ConstructExplainOpt
- ConstructExplain
- ConstructExplain (plan)
- ConstructShowTrace
- ConstructInsert
- ConstructInsertFastPath
- ConstructUpdate
- ConstructUpsert
- ConstructDelete
- ConstructDeleteRange
- ConstructCreateTable
- ConstructCreateView
- ConstructSequenceSelect
- ConstructSaveTable
- ConstructErrorIfRows
- ConstructOpaque
- ConstructAlterTableSplit
- ConstructAlterTableUnsplit
- ConstructAlterTableUnsplitAll
- ConstructAlterTableRelocate
- ConstructBuffer
- ConstructScanBuffer
- ConstructRecursiveCTE
- ConstructControlJobs
- ConstructCancelQueries
- ConstructCancelSessions
- ConstructExport
Miscellaneous items:
- RenameColumns
- ConstructPlan
- populate index usage statistics in the DistSQL spec factory
Background info
Currently, the last stage in the optimizer is to use the execbuilder.Builder to construct a planNode tree from a memo expression tree. This planNode tree is subsequently converted to a collection of ProcessorSpecs in the distsql physical planner:
cockroach/pkg/sql/distsql_physical_planner.go
Line 2293 in 1e9f570
| func (dsp *DistSQLPlanner) createPlanForNode( |
We need to get rid of this redundant conversion and create ProcessorSpecs directly.
For 20.2, our focus should be on adding an off-by-default option to remove this redundant planning phase for the kv --read-percent=100 workload. I envision this as creating a new implementation of the exec.Factory implementation that is swapped in when a cluster setting is set.
Supporting kv --read-percent=100 is mostly a question of implementing ConstructScan and nothing else. However, this will probably teach us a lot of what we need to do to move over and how to do it. My hope is that once we get the TableReaderSpecs created directly, we'll have a concrete gameplan on moving over the rest of the processor spec creation.
Epic: CRDB-79
Jira issue: CRDB-4407
Metadata
Metadata
Assignees
Labels
Type
Projects
Status