Skip to content

opt: collect stats on inverted indexes #48219

@rytaft

Description

@rytaft

We currently do not collect statistics on inverted indexes, which results in poor cardinality and cost estimation of inverted index scans and inverted zig zag joins. This issue covers the work needed to collect statistics, but not to use the statistics in the optimizer. That will be covered by a separate issue.

For inverted index statistics, we can probably reuse a lot of the machinery we've already built for collecting statistics on the primary index. Just like with normal column statistics, we'll want to store the number of distinct keys in the index, as well as the total number of values in the index. Since inverted indexes can contain duplicate values, this total will be greater than or equal to the number of rows in the table. Additionally, we'll want to collect histograms on inverted indexes. Each bucket will represent a range of keys in the index, and it will store the number of values indexed by that range of keys.

As described in the Geospatial RFC, improving statistics on inverted indexes will benefit the work we're doing for geospatial support, as well as for JSON and Array inverted index support.

Metadata

Metadata

Assignees

Labels

A-spatialSpatial work that is *not* related to builtins.C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions