Hash join: automatically choose build side#68682
Conversation
|
This is an automated comment for commit b5e3df9 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page
Successful checks
|
8276d1d to
78bfc55
Compare
|
Need also to enable setting in integration tests by #69328 However, the resr should be alright |
| void optimizePrewhere(Stack & stack, QueryPlan::Nodes & nodes); | ||
| void optimizeReadInOrder(QueryPlan::Node & node, QueryPlan::Nodes & nodes); | ||
| void optimizeAggregationInOrder(QueryPlan::Node & node, QueryPlan::Nodes &); | ||
| void optimizeJoin(QueryPlan::Node & node, QueryPlan::Nodes &); |
| return; | ||
|
|
||
| const auto & join = join_step->getJoin(); | ||
| if (join->pipelineType() != JoinPipelineType::FillRightFirst || !join->isCloneSupported() || typeid_cast<const HashJoin *>(join.get())) |
There was a problem hiding this comment.
could we allow ConcurrentHashJoin here too?
There was a problem hiding this comment.
I enabled isCloneSupported for ConcurrentHashJoin, should work for that implementation as well
|
|
||
| const auto & table_join = join->getTableJoin(); | ||
| auto kind = table_join.kind(); | ||
| if (table_join.hasUsing() |
There was a problem hiding this comment.
Actually it's a todo, since using implemented so it depends on table orders, can be fixed but a bit of headache
ClickHouse/src/Processors/QueryPlan/Optimizations/optimizeJoin.cpp
Lines 64 to 65 in a457683
| join_step->swap_streams = true; | ||
|
|
||
| auto updated_table_join = std::make_shared<TableJoin>(table_join); | ||
| updated_table_join->swapSides(); |
There was a problem hiding this comment.
it seems it could happen that different nodes could make different optimisation decisions here in a distributed query. afaiu we change output header too. is it actually a problem and should we then maybe take some actions to preserve the header?
There was a problem hiding this comment.
Implemented transformations inside JoinStep, so it keeps column order unchanged
4223def to
3cf2b90
Compare
Signed-off-by: vdimir <vdimir@clickhouse.com>
3cf2b90 to
35cf3e8
Compare
697017b to
fca592a
Compare
ede7e1d to
32fe869
Compare
| {"restore_replace_external_dictionary_source_to_null", false, false, "New setting."}, | ||
| {"show_create_query_identifier_quoting_rule", "when_necessary", "when_necessary", "New setting."}, | ||
| {"show_create_query_identifier_quoting_style", "Backticks", "Backticks", "New setting."}, | ||
| {"query_plan_join_inner_table_selection", "auto", "auto", "New setting."}, |
* [GLUTEN-1632][CH]Daily Update Clickhouse Version (20241105) * Fix Build due to ClickHouse/ClickHouse#71261 * Fix Build due to ClickHouse/ClickHouse#68682 --------- Co-authored-by: kyligence-git <gluten@kyligence.io> Co-authored-by: Chang Chen <baibaichen@gmail.com>
* [GLUTEN-1632][CH]Daily Update Clickhouse Version (20241107) * Revert "Fix Build due to ClickHouse/ClickHouse#68682", see ClickHouse/ClickHouse#71527 --------- Co-authored-by: kyligence-git <gluten@kyligence.io> Co-authored-by: Chang Chen <baibaichen@gmail.com>
…lect_inner_table Revert "Resubmit #68682"
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Added an option to select the side of the join that will act as the inner table in the query plan. This is controlled byResubmitted Resubmit #68682 #71577query_plan_join_inner_table_selection, which can be set toauto. In this mode, ClickHouse will try to choose the table with the smallest number of rows.Documentation entry for user-facing changes
CI Settings (Only check the boxes if you know what you are doing):