Skip to content

sql: stats being different leads to different plans when using a placeholder #99389

@cucaroach

Description

@cucaroach

This originated from https://github.com/cockroachlabs/support/issues/2179. Basically we a get a RowCount of 1 w/o placeholders and a RowCount of 1/3 the columns distinct count w/ a placeholder. Here's a really straightforward distilled repro courtesy @DrewKimball :

exec-ddl
CREATE TABLE xy (x INT, y INT, INDEX (x) STORING (y));
----
exec-ddl
ALTER TABLE xy INJECT STATISTICS '[
  {
    "columns": ["x"],
    "created_at": "2018-01-01 1:00:00.00000+00:00",
    "row_count": 100000,
    "distinct_count": 100000
  },
  {
    "columns": ["y"],
    "created_at": "2018-01-01 1:00:00.00000+00:00",
    "row_count": 100000,
    "distinct_count": 2
  }
]';
----
opt format=show-all
WITH RECURSIVE cte(a,b) AS (
  (SELECT * FROM xy WHERE x = 1)
  UNION ALL
  (SELECT * FROM cte WHERE False)
)
SELECT * FROM cte;
----

If you replace 1 with $1 the problem becomes apparent. Doesn't matter if you use this simple approach or assign-placeholders-opt query-args=(1) format=show-all.

I've narrowed it down to (sb *statisticsBuilder) applyConstraintSet but haven't completed the investigation. In the customer issue the result is a hash join vs a lookup join which is believed to be causing poor performance.

Jira issue: CRDB-25906

Metadata

Metadata

Assignees

Labels

C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.O-supportWould prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docsT-sql-queriesSQL Queries Teamv23.1.2

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions