Skip to content

Analyzer: Subquery filter not using primary key #48936

@jrdi

Description

@jrdi

Describe the unexpected behaviour

When using a subquery to filter a table the primary/sorting key is not being used.

How to reproduce

This is happening in CH 23.3 but also reproduced in master:

SELECT version()

┌─version()─┐
│ 23.8.1.1  │
└───────────┘

The minimum reproducible script:

DROP TABLE IF EXISTS test;

CREATE TABLE test.test
ENGINE = MergeTree
ORDER BY (pk1, pk2)
SETTINGS index_granularity = 8
AS
SELECT
    10 pk1,
    number pk2,
    'test_' || toString(number) str
FROM numbers(20);

EXPLAIN indexes = 1
SELECT *
FROM test.test
WHERE pk1 <= 10 AND (pk2 IN (5))
SETTINGS allow_experimental_analyzer = 1;

EXPLAIN indexes = 1
SELECT *
FROM test.test
WHERE pk1 <= 10 AND (pk2 IN (SELECT 5))
SETTINGS allow_experimental_analyzer = 1;

EXPLAIN indexes = 1
SELECT *
FROM test.test
WHERE pk1 <= 10 AND (pk2 IN (SELECT number FROM numbers(5, 1)))
SETTINGS allow_experimental_analyzer = 1;

When filtering by a subquery, the filter is not being used. See both query plans:

┌─explain─────────────────────────────────────────────────────────────┐
│ Expression ((Project names + Projection))                           │
│   Expression (Change column names to column identifiers)            │
│     ReadFromMergeTree (test.test)                                   │
│     Indexes:                                                        │
│       PrimaryKey                                                    │
│         Keys:                                                       │
│           pk1                                                       │
│           pk2                                                       │ <= pk2 being used
│         Condition: and((pk1 in (-Inf, 10]), (pk2 in 1-element set)) │
│         Parts: 1/1                                                  │
│         Granules: 1/3                                               │
└─────────────────────────────────────────────────────────────────────┘
┌─explain─────────────────────────────────────────────────────────────────────────────────────┐
│ CreatingSets (Create sets before main query execution)                                      │
│   Expression ((Project names + Projection))                                                 │
│     Expression (Change column names to column identifiers)                                  │
│       ReadFromMergeTree (test.test)                                                         │
│       Indexes:                                                                              │
│         PrimaryKey                                                                          │
│           Keys:                                                                             │
│             pk1                                                                             │ <= pk2 is gone
│           Condition: (pk1 in (-Inf, 10])                                                    │
│           Parts: 1/1                                                                        │
│           Granules: 3/3                                                                     │
│   CreatingSet (Create set for subquery)                                                     │
│     Expression ((Project names + (Projection + Change column names to column identifiers))) │
│       ReadFromStorage (SystemOne)                                                           │
└─────────────────────────────────────────────────────────────────────────────────────────────┘

┌─explain───────────────────────────────────────────────────────────────┐
│ CreatingSets (Create sets before main query execution)                │
│   Expression ((Project names + Projection))                           │
│     Expression (Change column names to column identifiers)            │
│       ReadFromMergeTree (test.test)                                   │
│       Indexes:                                                        │
│         PrimaryKey                                                    │
│           Keys:                                                       │
│             pk1                                                       │
│             pk2                                                       │
│           Condition: and((pk1 in (-Inf, 10]), (pk2 in 1-element set)) │
│           Parts: 1/1                                                  │
│           Granules: 1/3                                               │
└───────────────────────────────────────────────────────────────────────┘

I have used SELECT 5 as subquery for simplicity reasons but this reproduces with any kind of subquery.

Expected behavior

When using a subquery I'd expect sorting keys to be used also. In real scenarios this is causing reading more data than expected.

Metadata

Metadata

Labels

experimental featureBug in the feature that should not be used in production

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions