Skip to content

opt: Add support for soft limit propagation #34811

@andy-kimball

Description

@andy-kimball

"Hard limits" put an absolute cap on the number of rows returned by an operator. "Soft limits" indicate the likely maximum number of rows that will be returned by an operator.

While the CBO currently supports hard limits, it does not support soft limits. It should propagate soft limit information through the opt tree as a physical property, and then use that information to make better cost estimates.

Here is an example (inspired by a real customer query) where soft limits would allow us to select the best plan:

SELECT w, x, y, z
FROM wxyz
INNER JOIN abcde
ON w = a AND x = b AND y = c
WHERE w = 'foo' AND x = '2AB23800-06B1-4E19-A3BB-DF3768B808D2'
ORDER BY d
LIMIT 10;

EXPECTED: The CBO should use the idx_abd index to order by d, and then do a lookup join to get 10 matching rows.

ACTUAL: The CBO uses a merge join using the idx_abcd index. It does this because it assumes the worst case when using the idx_abd index. Theoretically, if none of the rows match wxyz rows, then it'd be necessary to scan the entire index, which would be very costly. However, given a LIMIT 10, that's very unlikely. The CBO can propagate down a soft limit that incorporates the likely number of rows that are needed, and then the coster can better estimate the cost of the idx_abd index scan based on that.

And here is the schema and statistics for the query:

CREATE TABLE abcde (
	a TEXT NOT NULL,
	b UUID NOT NULL,
	c UUID NOT NULL,
	d VARCHAR(255) NOT NULL,
	e TEXT NOT NULL,
	CONSTRAINT "primary" PRIMARY KEY (a, b, c),
	UNIQUE INDEX idx_abd (a, b, d),
	UNIQUE INDEX idx_abcd (a, b, c, d)
);

ALTER TABLE abcde INJECT STATISTICS '[
  {
    "columns": ["a"],
    "created_at": "2019-02-08 04:10:40.001179+00:00",
    "row_count": 250000,
    "distinct_count": 1
  },
  {
    "columns": ["b"],
    "created_at": "2019-02-08 04:10:40.119954+00:00",
    "row_count": 250000,
    "distinct_count": 2
  },
  {
    "columns": ["d"],
    "created_at": "2019-02-08 04:10:40.119954+00:00",
    "row_count": 250000,
    "distinct_count": 125000
  }
]';

CREATE TABLE wxyz (
	w TEXT NOT NULL,
	x UUID NOT NULL,
	y UUID NOT NULL,
	z TEXT NOT NULL,
	CONSTRAINT "primary" PRIMARY KEY (w, x, y),
	CONSTRAINT "foreign" FOREIGN KEY (w, x, y) REFERENCES abcde (a, b, c)
);

ALTER TABLE wxyz INJECT STATISTICS '[
  {
    "columns": ["w"],
    "created_at": "2019-02-08 04:10:40.001179+00:00",
    "row_count": 10000,
    "distinct_count": 1
  },
  {
    "columns": ["x"],
    "created_at": "2019-02-08 04:10:40.119954+00:00",
    "row_count": 10000,
    "distinct_count": 1
  },
  {
    "columns": ["y"],
    "created_at": "2019-02-08 04:10:40.119954+00:00",
    "row_count": 10000,
    "distinct_count": 2500
  }
]';

Metadata

Metadata

Labels

C-performancePerf of queries or internals. Solution not expected to change functional behavior.

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions