-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Filter-push-down does not remove unneeded columns from JOIN #75152
Copy link
Copy link
Closed
Labels
comp-joinsJOINs end-to-end (planning hooks + runtime join operators/algorithms). Single bucket to avoid pla...JOINs end-to-end (planning hooks + runtime join operators/algorithms). Single bucket to avoid pla...
Description
Let's compare
with 1e5 as cnt
select count() from (select number as k, sipHash64(arrayStringConcat(arrayMap(x -> number, range(number % 1000)), ';')) as v from numbers(cnt)) as l inner join (select number as k, sipHash64(arrayStringConcat(arrayMap(x -> number, range(number % 1000)), ';')) as v from numbers(cnt)) as r on l.k = r.k where l.v != 12094310375782415882 and r.v != 12094310375782415882 settings query_plan_use_new_logical_join_step=1;
┌─count()─┐
1. │ 99999 │
└─────────┘
1 row in set. Elapsed: 2.456 sec. Processed 200.00 thousand rows, 1.60 MB (81.44 thousand rows/s., 651.53 KB/s.)
Peak memory usage: 1.13 GiB.
and
with 1e5 as cnt
select count() from (select number as k, sipHash64(arrayStringConcat(arrayMap(x -> number, range(number % 1000)), ';')) as v from numbers(cnt) where v != 12094310375782415882) as l inner join (select number as k, sipHash64(arrayStringConcat(arrayMap(x -> number, range(number % 1000)), ';')) as v from numbers(cnt) where v != 12094310375782415882) as r on l.k = r.k settings query_plan_use_new_logical_join_step=1;
┌─count()─┐
1. │ 99999 │
└─────────┘
1 row in set. Elapsed: 2.075 sec. Processed 200.00 thousand rows, 1.60 MB (96.37 thousand rows/s., 770.94 KB/s.)
Peak memory usage: 1.13 GiB.
The second query is faster, even though the predicate-push-down optimization is applied.
The reason is that unneeded columns are not removed from JOIN.
:) explain header=1 with 1e5 as cnt select count() from (select number as k, sipHash64(arrayStringConcat(arrayMap(x -> number, range(number % 1000)), ';')) as v from numbers(cnt)) as l inner join (select number as k, sipHash64(arrayStringConcat(arrayMap(x -> number, range(number % 1000)), ';')) as v from numbers(cnt)) as r on l.k = r.k where l.v != 12094310375782415882 and r.v != 12094310375782415882 settings query_plan_use_new_logical_join_step=1;
EXPLAIN header = 1
WITH 100000. AS cnt
SELECT count()
FROM
(
SELECT
number AS k,
sipHash64(arrayStringConcat(arrayMap(x -> number, range(number % 1000)), ';')) AS v
FROM numbers(cnt)
) AS l
INNER JOIN
(
SELECT
number AS k,
sipHash64(arrayStringConcat(arrayMap(x -> number, range(number % 1000)), ';')) AS v
FROM numbers(cnt)
) AS r ON l.k = r.k
WHERE (l.v != 12094310375782415882) AND (r.v != 12094310375782415882)
SETTINGS query_plan_use_new_logical_join_step = 1
Query id: 8c8038ea-b266-47b5-91de-e5e775f7ead9
┌─explain────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
1. │ Expression ((Project names + Projection)) │
2. │ Header: count() UInt64 │
3. │ Aggregating │
4. │ Header: count() UInt64 │
5. │ Expression ((Before GROUP BY + )) │
6. │ Empty header │
7. │ Expression │
8. │ Header: __table1.v UInt64 │
9. │ __table3.v UInt64 │
10. │ Join │
11. │ Header: __table1.v UInt64 │
12. │ __table3.v UInt64 │
13. │ Expression │
14. │ Header: __table1.k UInt64 │
15. │ __table1.v UInt64 │
16. │ Filter (( + (Change column names to column identifiers + (Project names + (Projection + Change column names to column identifiers))))) │
17. │ Header: __table1.k UInt64 │
18. │ __table1.v UInt64 │
19. │ ReadFromSystemNumbers │
20. │ Header: number UInt64 │
21. │ Expression │
22. │ Header: __table3.k UInt64 │
23. │ __table3.v UInt64 │
24. │ Filter (( + (Change column names to column identifiers + (Project names + (Projection + Change column names to column identifiers))))) │
25. │ Header: __table3.k UInt64 │
26. │ __table3.v UInt64 │
27. │ ReadFromSystemNumbers │
28. │ Header: number UInt64 │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
28 rows in set. Elapsed: 0.020 sec. Note that columns __table1.v and __table3.v are not needed anymore after the optimization, but we keep it (e.g. in the hash table for hash_join).
Unneeded columns do not appear for the manually-optimized query.
:) explain header=1 with 1e5 as cnt select count() from (select number as k, sipHash64(arrayStringConcat(arrayMap(x -> number, range(number % 1000)), ';')) as v from numbers(cnt) where v != 12094310375782415882) as l inner join (select number as k, sipHash64(arrayStringConcat(arrayMap(x -> number, range(number % 1000)), ';')) as v from numbers(cnt) where v != 12094310375782415882) as r on l.k = r.k
EXPLAIN header = 1
WITH 100000. AS cnt
SELECT count()
FROM
(
SELECT
number AS k,
sipHash64(arrayStringConcat(arrayMap(x -> number, range(number % 1000)), ';')) AS v
FROM numbers(cnt)
WHERE v != 12094310375782415882
) AS l
INNER JOIN
(
SELECT
number AS k,
sipHash64(arrayStringConcat(arrayMap(x -> number, range(number % 1000)), ';')) AS v
FROM numbers(cnt)
WHERE v != 12094310375782415882
) AS r ON l.k = r.k
Query id: cea54d57-7cd4-488f-8597-ba6d58fd6368
┌─explain──────────────────────────────────────────────────────────────────────────────────────────────────────────┐
1. │ Expression ((Project names + Projection)) │
2. │ Header: count() UInt64 │
3. │ Aggregating │
4. │ Header: count() UInt64 │
5. │ Expression ((Before GROUP BY + Drop unused columns after JOIN)) │
6. │ Empty header │
7. │ Join (JOIN FillRightFirst) │
8. │ Header: __table1.k UInt64 │
9. │ Expression ((JOIN actions + (Change column names to column identifiers + (Project names + Projection)))) │
10. │ Header: __table1.k UInt64 │
11. │ Filter ((WHERE + Change column names to column identifiers)) │
12. │ Header: __table2.number UInt64 │
13. │ ReadFromSystemNumbers │
14. │ Header: number UInt64 │
15. │ Expression ((JOIN actions + (Change column names to column identifiers + (Project names + Projection)))) │
16. │ Header: __table3.k UInt64 │
17. │ Filter ((WHERE + Change column names to column identifiers)) │
18. │ Header: __table4.number UInt64 │
19. │ ReadFromSystemNumbers │
20. │ Header: number UInt64 │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘Note that __table1.k is also not needed in this case. We probably keep it to avoid an empty block. We can fix it as well, but it is less important.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
comp-joinsJOINs end-to-end (planning hooks + runtime join operators/algorithms). Single bucket to avoid pla...JOINs end-to-end (planning hooks + runtime join operators/algorithms). Single bucket to avoid pla...