Improve all join performance by append RowRefList or RowRef to AddedColumns for lazy output#63677
Conversation
RowRefList to LazyOutput directlyRowRefList / RowRef to LazyOutput directly
RowRefList / RowRef to LazyOutput directlyRowRefList or RowRef to LazyOutput directly
RowRefList or RowRef to LazyOutput directlyRowRefList or RowRef to AddedColumns for lazy output
|
This is an automated comment for commit 95e07ce with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page
Successful checks
|
9dbaac4 to
ec1886b
Compare
src/Interpreters/RowRefs.h
Outdated
There was a problem hiding this comment.
it is not a good idea to increase RowRefList size. do we really need this field?
There was a problem hiding this comment.
yes, this is needed. it will be used to compute the current_offset in addFoundRowAll
src/Interpreters/HashJoin.cpp
Outdated
There was a problem hiding this comment.
imo it would be better to avoid using func pointers (i.e. replace with template parameters) to avoid indirect call here
tests/performance/all_join_opt.xml
Outdated
There was a problem hiding this comment.
why do we test only with max_threads=1?
669f417 to
b44ea2f
Compare
|
pls check reported slowdowns in |
|
yes, I see. Maybe it it same problem as |
|
It seems the reference in the for loop of |
src/Interpreters/HashJoin.cpp
Outdated
There was a problem hiding this comment.
and why it is beneficial in the end? just because we filling in one array instead of two?
There was a problem hiding this comment.
I think it is because the fill array operation of addFoundRowAll is inside the joinRightColumns for loop, which will scan all the left table rows, and if the right table rows has a lots of matched rows per key, then the time complexity is very high. so I move the for loop of addFoundRowAll to the buildOutput to reduce the time complexity of joinRightColumns. and on another hand, the old code will fill 2 array, and now it fill one, and it can also improve the performance.
But like the case in join_append_block, every row in right table matched is single one, which will not reduce the time complexity, and it slowdown as the buildOutput's for loop cache miss of block. @nickitat
There was a problem hiding this comment.
indeed, it could be almost fixed by the following patch: https://pastila.nl/?0169ab09/eb769dfb95b000c681651866b8c89d86#VYcmBdzNznC3fy/9hHzw3Q==
afaiu the similar problem should exist for buildOutputFromRowRef as well, so it makes sense to unify their implementation and use this patch in both cases
There was a problem hiding this comment.
indeed, it could be almost fixed by the following patch: https://pastila.nl/?0169ab09/eb769dfb95b000c681651866b8c89d86#VYcmBdzNznC3fy/9hHzw3Q== afaiu the similar problem should exist for
buildOutputFromRowRefas well, so it makes sense to unify their implementation and use this patch in both cases
This patch make test case hashjoin_with_large_output 10% slowdown. @nickitat , should we add a setting like a threshold, when the matched rows number exceed this threshold, then we use buildOutFromRowRefList, otherwise, output by a vector of blocks like in this patch?
|
overall looks ok for me, I'll check perf myself |
e1992fc to
397deee
Compare
|
@nickitat I had add a settings |
There was a problem hiding this comment.
pls leave a comment explaining why do we do this. it is really tricky place
5f874cc to
85bd63a
Compare
|
could this pr be merged ? @nickitat |
|
thanks for the contribution and your patience ) |
| \ | ||
| M(Bool, join_use_nulls, false, "Use NULLs for non-joined rows of outer JOINs for types that can be inside Nullable. If false, use default value of corresponding columns data type.", IMPORTANT) \ | ||
| \ | ||
| M(Int32, join_output_by_rowlist_perkey_rows_threshold, 5, "The lower limit of per-key average rows in the right table to determine whether to output by row list in hash join.", 0) \ |
There was a problem hiding this comment.
The setting is used as unsigned int (size_t) but declared as Int32 here. It makes sense to make it unsigned (before 24.9)

Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Improve all join perfromance by append
RowRefListorRowRefto AddedColumns for lazy output, while buildOutput, we useRowRefList/RowReffor output, and removeis_join_getcondition frombuildOutputfor loop.And we test as below, the left table
test1((a Int64, b String, c LowCardinality(String), has 10000000 rows, the right table test2(a Int64, b String, c LowCardinality(String)) has 100000 rows, the test sqlSELECT MAX(test1.a) FROM test1 INNER JOIN test2 on test1.b = test2.b SETTINGS max_threads=1before this pr:
after this pr:
Documentation entry for user-facing changes
Modify your CI run
NOTE: If your merge the PR with modified CI you MUST KNOW what you are doing
NOTE: Checked options will be applied if set before CI RunConfig/PrepareRunConfig step
Include tests (required builds will be added automatically):
Exclude tests:
Extra options:
Only specified batches in multi-batch jobs:
Details