Skip to content

sql, stats: collect small histograms on all columns #49374

@rytaft

Description

@rytaft

Currently, we only collect histograms on the first column of each index in a table. The reason for this is that the default size of histograms is 200 buckets, so they use a non-trivial amount of space and require a non-trivial amount of computation in order to collect them and use them in the optimizer. Still, it's generally worth the extra space and computation for index columns; index columns are likely to be used in predicates, so histograms are likely to be useful for cardinality estimation.

We could still get some benefit from collecting smaller histograms on other columns, however. Collecting a histogram with 2 buckets would be equivalent to storing the maximum and minimum value for a column, which could significantly improve our selectivity estimation for some queries. Instead of assuming all inequalities (e.g., x < 10) have selectivity equal to 1/3, we can make an educated guess based on the domain of the variable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-sql-optimizerSQL logical planning and optimizations.C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)C-performancePerf of queries or internals. Solution not expected to change functional behavior.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions