Skip to content

Merge filter into JOIN condition#78877

Merged
novikd merged 17 commits intomasterfrom
merge-filter-into-join
Apr 15, 2025
Merged

Merge filter into JOIN condition#78877
novikd merged 17 commits intomasterfrom
merge-filter-into-join

Conversation

@novikd
Copy link
Copy Markdown
Member

@novikd novikd commented Apr 9, 2025

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Merge equality conditions from filter query plan step into JOIN condition if possible to allow using them as hash table keys.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Apr 9, 2025

Workflow [PR], commit [2f9e9bb]

@clickhouse-gh clickhouse-gh bot added the pr-performance Pull request with some performance improvements label Apr 9, 2025
@novikd novikd added the ci-performance performance only label Apr 9, 2025
@novikd
Copy link
Copy Markdown
Member Author

novikd commented Apr 9, 2025

Nice
image

@novikd novikd removed the ci-performance performance only label Apr 9, 2025
@novikd
Copy link
Copy Markdown
Member Author

novikd commented Apr 11, 2025

CH Inc sync — tests failed

Unrelated

Integration tests (aarch64, 1/4) — fail: 1, passed: 811

Unrelated

Integration tests (release, 3/4) — fail: 1, passed: 891

Unrelated

@novikd novikd requested a review from KochetovNicolai April 11, 2025 09:38
Comment on lines +337 to +338
join_expressions.left_pre_join_actions->mergeNodes(std::move(predicate.left));
join_expressions.right_pre_join_actions->mergeNodes(std::move(predicate.right));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mergeNodes is a dangerous function with the limited scope of use, cause it match dag nodes by names. But we don't have a DAG invariant that node names are unique and correct (opposite, DAG can contain any aliases).

Just ::merge should work, but I guess I need to debug why it does not.

Another option is to write a custom function which matches only inputs (it should be simple, actually; it would require replacing rhs dag inputs to matching lhs dag inputs and uniting the list of nodes)

Copy link
Copy Markdown
Member

@KochetovNicolai KochetovNicolai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, only additional cast for return type may be needed after filter condition is modified.

@novikd
Copy link
Copy Markdown
Member Author

novikd commented Apr 15, 2025

Stateless tests (release, ParallelReplicas, s3 storage) — Server died, fail: 12, passed: 4230, skipped: 452

#75969

Integration tests (tsan, 6/6) — fail: 2, passed: 557

#71049

Integration tests (asan, old analyzer, 4/6) — fail: 1, passed: 552

Unrelated

@novikd novikd added this pull request to the merge queue Apr 15, 2025
Merged via the queue into master with commit df7a54f Apr 15, 2025
113 of 122 checks passed
@novikd novikd deleted the merge-filter-into-join branch April 15, 2025 09:09
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Apr 15, 2025
robot-ch-test-poll1 added a commit that referenced this pull request Apr 15, 2025
Cherry pick #78877 to 25.4: Merge filter into JOIN condition
@robot-ch-test-poll robot-ch-test-poll added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Apr 15, 2025
@robot-ch-test-poll3 robot-ch-test-poll3 added the pr-backports-created-cloud deprecated label, NOOP label Apr 15, 2025
novikd added a commit that referenced this pull request Apr 16, 2025
Backport #78877 to 25.4: Merge filter into JOIN condition
@robot-clickhouse robot-clickhouse added the pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR label Jul 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-backports-created-cloud deprecated label, NOOP pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-performance Pull request with some performance improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants