-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Analyzer: Subquery filter not using primary key #48936
Copy link
Copy link
Closed
Labels
experimental featureBug in the feature that should not be used in productionBug in the feature that should not be used in production
Description
Describe the unexpected behaviour
When using a subquery to filter a table the primary/sorting key is not being used.
How to reproduce
This is happening in CH 23.3 but also reproduced in master:
SELECT version()
┌─version()─┐
│ 23.8.1.1 │
└───────────┘The minimum reproducible script:
DROP TABLE IF EXISTS test;
CREATE TABLE test.test
ENGINE = MergeTree
ORDER BY (pk1, pk2)
SETTINGS index_granularity = 8
AS
SELECT
10 pk1,
number pk2,
'test_' || toString(number) str
FROM numbers(20);
EXPLAIN indexes = 1
SELECT *
FROM test.test
WHERE pk1 <= 10 AND (pk2 IN (5))
SETTINGS allow_experimental_analyzer = 1;
EXPLAIN indexes = 1
SELECT *
FROM test.test
WHERE pk1 <= 10 AND (pk2 IN (SELECT 5))
SETTINGS allow_experimental_analyzer = 1;
EXPLAIN indexes = 1
SELECT *
FROM test.test
WHERE pk1 <= 10 AND (pk2 IN (SELECT number FROM numbers(5, 1)))
SETTINGS allow_experimental_analyzer = 1;When filtering by a subquery, the filter is not being used. See both query plans:
┌─explain─────────────────────────────────────────────────────────────┐
│ Expression ((Project names + Projection)) │
│ Expression (Change column names to column identifiers) │
│ ReadFromMergeTree (test.test) │
│ Indexes: │
│ PrimaryKey │
│ Keys: │
│ pk1 │
│ pk2 │ <= pk2 being used
│ Condition: and((pk1 in (-Inf, 10]), (pk2 in 1-element set)) │
│ Parts: 1/1 │
│ Granules: 1/3 │
└─────────────────────────────────────────────────────────────────────┘
┌─explain─────────────────────────────────────────────────────────────────────────────────────┐
│ CreatingSets (Create sets before main query execution) │
│ Expression ((Project names + Projection)) │
│ Expression (Change column names to column identifiers) │
│ ReadFromMergeTree (test.test) │
│ Indexes: │
│ PrimaryKey │
│ Keys: │
│ pk1 │ <= pk2 is gone
│ Condition: (pk1 in (-Inf, 10]) │
│ Parts: 1/1 │
│ Granules: 3/3 │
│ CreatingSet (Create set for subquery) │
│ Expression ((Project names + (Projection + Change column names to column identifiers))) │
│ ReadFromStorage (SystemOne) │
└─────────────────────────────────────────────────────────────────────────────────────────────┘
┌─explain───────────────────────────────────────────────────────────────┐
│ CreatingSets (Create sets before main query execution) │
│ Expression ((Project names + Projection)) │
│ Expression (Change column names to column identifiers) │
│ ReadFromMergeTree (test.test) │
│ Indexes: │
│ PrimaryKey │
│ Keys: │
│ pk1 │
│ pk2 │
│ Condition: and((pk1 in (-Inf, 10]), (pk2 in 1-element set)) │
│ Parts: 1/1 │
│ Granules: 1/3 │
└───────────────────────────────────────────────────────────────────────┘I have used SELECT 5 as subquery for simplicity reasons but this reproduces with any kind of subquery.
Expected behavior
When using a subquery I'd expect sorting keys to be used also. In real scenarios this is causing reading more data than expected.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
experimental featureBug in the feature that should not be used in productionBug in the feature that should not be used in production