-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Description
Enhancement
The current implementation of cost calculation is tightly coupled with physical optimization.
In physical optimization, the optimizer changes the plan structure frequently and updates the current cost correspondingly.
Here is an example:
func addPushedDownSelection(t *task) {
tableReader := t.Plan.(*TableReader)
pushedDownFilters := PushDownExprs(tableReader.filters) // PHY-OPT
selCost := t.Count() * CopCPUFactor // COST
selection := NewSelection(pushedDownFilters) // PHY-OPT
selection.cost = selCost // COST
t.cost += selCost // COST
selection.Child = tableReader.Child // PHY-OPT
tableReader.Child = selection.Child // PHY-OPT
}
addPushedDownSelection is used to push selection down, and comments COST and PHY-OPT are used to represent the corresponding line is for cost calculation or plan change in physical optimization, and then you can see they are coupled tightly.
The current implementation is hard to maintain and has already caused lots of issues: #32675 #32672 #32698 #27189 #30103 #32362.
To solve this problem thoroughly, we decided to refactor the implementation of cost calculation.
A new interface Plan.CalCost() float64 will be introduced and all code related to cost calculation will be removed from physical optimization, and then you can see the example above become like this:
func addPushedDownSelection(t *task) {
tableReader := t.Plan.(*TableReader)
pushedDownFilters := PushDownExprs(tableReader.filters) // PHY-OPT
selection := NewSelection(pushedDownFilters) // PHY-OPT
selection.Child = tableReader.Child // PHY-OPT
tableReader.Child = selection.Child // PHY-OPT
}
And all code related to cost calculation will be moved into the new and standard interface Plan.CalCost() float64.
Below is an example of IndexReader:
func (p *PhysicalIndexReader) CalPlanCost(taskType property.TaskType) float64 {
p.planCost = p.indexPlan.CalPlanCost(property.CopSingleReadTaskType) // COST: child's cost
p.planCost += p.indexPlan.StatsCount() * getRowSize(p) * netFactor // COST: net I/O cost
p.planCost += getSeekCost(p) // COST: net seek cost
p.planCost /= float64(p.ctx.GetSessionVars().DistSQLScanConcurrency()) // COST: consider concurrency
return p.planCost
}
After this refactoring, the cost model will become easier to maintain and calibrate.
And the cost of a plan becomes re-computable, which means you can invoke Plan.CalCost() multiple times in varied places, and I think some other modules like (PlanCache, SPM, PlanRewriter, JoinReorder, ...) can also get some benefits from it.
Here is a detailed design doc(Chinese) about this.
Here is the demo.