Skip to content

planner: refactor the implementation of cost calculation #33945

@qw4990

Description

@qw4990

Enhancement

The current implementation of cost calculation is tightly coupled with physical optimization.
In physical optimization, the optimizer changes the plan structure frequently and updates the current cost correspondingly.
Here is an example:

func addPushedDownSelection(t *task) {
    tableReader := t.Plan.(*TableReader)
    pushedDownFilters := PushDownExprs(tableReader.filters)    // PHY-OPT
    selCost := t.Count() * CopCPUFactor                        // COST
    selection := NewSelection(pushedDownFilters)               // PHY-OPT
    selection.cost = selCost                                   // COST
    t.cost += selCost                                          // COST
    selection.Child = tableReader.Child                        // PHY-OPT
    tableReader.Child = selection.Child                        // PHY-OPT
}

addPushedDownSelection is used to push selection down, and comments COST and PHY-OPT are used to represent the corresponding line is for cost calculation or plan change in physical optimization, and then you can see they are coupled tightly.
The current implementation is hard to maintain and has already caused lots of issues: #32675 #32672 #32698 #27189 #30103 #32362.

To solve this problem thoroughly, we decided to refactor the implementation of cost calculation.
A new interface Plan.CalCost() float64 will be introduced and all code related to cost calculation will be removed from physical optimization, and then you can see the example above become like this:

func addPushedDownSelection(t *task) {
    tableReader := t.Plan.(*TableReader)
    pushedDownFilters := PushDownExprs(tableReader.filters)    // PHY-OPT
    selection := NewSelection(pushedDownFilters)               // PHY-OPT
    selection.Child = tableReader.Child                        // PHY-OPT
    tableReader.Child = selection.Child                        // PHY-OPT
}

And all code related to cost calculation will be moved into the new and standard interface Plan.CalCost() float64.
Below is an example of IndexReader:

func (p *PhysicalIndexReader) CalPlanCost(taskType property.TaskType) float64 {
	p.planCost = p.indexPlan.CalPlanCost(property.CopSingleReadTaskType)   // COST: child's cost
	p.planCost += p.indexPlan.StatsCount() * getRowSize(p) * netFactor     // COST: net I/O cost
	p.planCost += getSeekCost(p)                                           // COST: net seek cost
	p.planCost /= float64(p.ctx.GetSessionVars().DistSQLScanConcurrency()) // COST: consider concurrency
	return p.planCost
}

After this refactoring, the cost model will become easier to maintain and calibrate.
And the cost of a plan becomes re-computable, which means you can invoke Plan.CalCost() multiple times in varied places, and I think some other modules like (PlanCache, SPM, PlanRewriter, JoinReorder, ...) can also get some benefits from it.

Here is a detailed design doc(Chinese) about this.
Here is the demo.

Metadata

Metadata

Assignees

Labels

sig/plannerSIG: Plannertype/enhancementThe issue or PR belongs to an enhancement.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions