Skip to content

Conversation

@zanmato1984
Copy link
Contributor

@zanmato1984 zanmato1984 commented Jan 6, 2024

Rationale for this change

Add residual filter support to swiss join.

What changes are included in this PR?

  1. Added class JoinResidualFilter as a centralized structure to evaluate residual filter in swiss join. It has various flavors of filtering for various join types. Zero-overhead is guaranteed for trivial filters (literal true and sometimes literal false/null). More detailed explanation in code comments.
  2. Tuned the structure of swiss join main body (JoinProbeProcessor::OnNextBatch) to better cope with JoinResidualFilter calls.

Are these changes tested?

Legacy UTs (HashJoin.Random, HashJoin.ResidualFilter and HashJoin.TrivialResidualFilter) cover part of this change. New fine-grained residual filter cases added as well.

Are there any user-facing changes?

No.

@github-actions
Copy link

github-actions bot commented Jan 6, 2024

⚠️ GitHub issue #20339 has been automatically assigned in GitHub to PR creator.

@zanmato1984 zanmato1984 changed the title GH-20339: [C++] Add residual predicate support to swiss join GH-20339: [C++] Add residual filter support to swiss join Jan 6, 2024
return static_cast<uint32_t>(first_greater - entries) - 1;
}

void SwissTableForJoin::payload_ids_to_key_ids(int num_rows, const uint32_t* payload_ids,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted because used nowhere in existing code base.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jan 8, 2024
@vibhatha
Copy link
Contributor

vibhatha commented Jan 9, 2024

@zanmato1984 would it be possible to add some test cases?

@zanmato1984
Copy link
Contributor Author

@zanmato1984 would it be possible to add some test cases?

Yes, I am working on that.

@vibhatha
Copy link
Contributor

vibhatha commented Jan 9, 2024

@zanmato1984 I will completely review once the test cases are there.

@zanmato1984
Copy link
Contributor Author

I tried to rebase and get things messy. Let me fix my branch first. Sorry for the trouble, guys :(

@zanmato1984
Copy link
Contributor Author

Hi @westonpace , I've addressed the comments you previously left except one thing I'm uncertain about in my last post. Would you please take a look again? Thanks!

@westonpace westonpace merged commit 0ce7267 into apache:main Mar 12, 2024
@westonpace westonpace removed the awaiting merge Awaiting merge label Mar 12, 2024
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 0ce7267.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 11 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++][Compute] Add residual predicate support to new (Swiss) hash join

3 participants