Skip to content

sql: collect partial statistics on arbitrary constraints using WHERE expressions #93998

@faizaanmadhani

Description

@faizaanmadhani

Currently, we only support collecting partial statistics at the extreme values of columns that are prefixes of an index. We should also try to support collecting partial statistics on user defined constraints by allowing users to define arbitrary WHERE clauses. Currently, the SQL grammar has been updated to support WHERE clauses inside the options list of a CREATE STATISTICS, but support should be added to allow users to specify the specific ranges they wish to query.

These spans would need to be created in the same place as we currently do for partial statistics USING EXTREMES, inside distsql_plan_stats.go. This could be done by (quoting @michae2 here):

To use the optimizer, we need to start with a tree.Statement and pass it to optbuilder.New. Then we call optbuilder.(*Builder).Build, and then xform.(*Optimizer).Optimize. After this we would call execbuilder.New and then execbuilder.(*Builder).Build to get a sql.planComponents. (You can see where these things are done for normal statements in pkg/sql/plan_opt.go, specifically starting with sql.(*planner).makeOptimizerPlan.) Finally we would probably need to call sql.(*DistSQLPlanner).createPhysPlan to turn the tree of sql.planNodes into a *sql.PhysicalPlan. (I forget if we still need this last step, maybe we don't now. For normal statements this is done when we call sql.(*DistSQLPlanner).PlanAndRunAll from sql.(*connExecutor).execWithDistSQLEngine.) Then we could add the samplers and sampleAggregator to the PhysicalPlan.

We would also need to update the merging functionality in merge.go to generalize it from prepending and appending histograms to merging histograms. Currently, we store the partial predicate (this would be the predicate from the WHERE clause) and we could use this to determine the range of our histogram to figure out how to merge.

Jira issue: CRDB-22641

Epic CRDB-19449

Metadata

Metadata

Assignees

Labels

A-sql-table-statsTable statistics (and their automatic refresh).C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)GA-blockerT-sql-queriesSQL Queries Teambranch-release-25.4Used to mark GA and release blockers and technical advisories for 25.4

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions