Skip to content

Fix nested joins#13742

Merged
mergify[bot] merged 2 commits intomasterfrom
mkleen/join-ordering-v4
Mar 14, 2023
Merged

Fix nested joins#13742
mergify[bot] merged 2 commits intomasterfrom
mkleen/join-ordering-v4

Conversation

@mkleen
Copy link
Contributor

@mkleen mkleen commented Feb 28, 2023

This makes nested joins work such as:

SELECT *
    FROM j1
    JOIN (j2 JOIN j3 ON j2.x = j3.x)
    ON j1.x = j2.x;

by adding a JoinRelation which is used to determine the correct JoinPairs.

fixes #13503

Summary of the changes / Why this improves CrateDB

Checklist

  • Added an entry in CHANGES.txt for user facing changes
  • Updated documentation & sql_features table for user facing changes
  • Touched code is covered by tests
  • CLA is signed
  • This does not contain breaking changes, or if it does:
    • It is released within a major release
    • It is recorded in CHANGES.txt
    • It was marked as deprecated in an earlier release if possible
    • You've thought about the consequences and other components are adapted
      (E.g. AdminUI)

@mkleen mkleen force-pushed the mkleen/join-ordering-v4 branch 8 times, most recently from d157d04 to 7849d46 Compare March 2, 2023 13:29
@mkleen mkleen force-pushed the mkleen/join-ordering-v4 branch 2 times, most recently from f773b9c to bb8d749 Compare March 2, 2023 14:03
@mkleen mkleen marked this pull request as ready for review March 2, 2023 14:10
@mkleen mkleen force-pushed the mkleen/join-ordering-v4 branch from bb8d749 to 8030659 Compare March 2, 2023 14:39
Copy link
Contributor

@BaurzhanSakhariev BaurzhanSakhariev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, thanks!

Left a question and minor comments.

@mkleen mkleen added the v/5.2 label Mar 2, 2023
@mkleen
Copy link
Contributor Author

mkleen commented Mar 3, 2023

Thanks for the feedback.
This needs one more iteration, also the changelog entry is missing.

@matriv
Copy link
Contributor

matriv commented Mar 3, 2023

@mkleen Thank you for the work on this, please ping and I will also review once you complete the new iteration.

@mkleen mkleen force-pushed the mkleen/join-ordering-v4 branch 2 times, most recently from 6a15641 to 179b8f5 Compare March 6, 2023 16:09
@mkleen mkleen changed the title Make nested joins work Fix nested joins Mar 6, 2023
@mkleen mkleen force-pushed the mkleen/join-ordering-v4 branch from abbffad to 410d109 Compare March 7, 2023 08:09
@mkleen
Copy link
Contributor Author

mkleen commented Mar 7, 2023

Ready for another review.

@mkleen mkleen force-pushed the mkleen/join-ordering-v4 branch from 410d109 to ecf4b2f Compare March 7, 2023 08:32
Copy link
Contributor

@BaurzhanSakhariev BaurzhanSakhariev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
Thanks for covering extra cases!

LGTM, added a minor question on tests

Copy link
Contributor

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, also left some comments, and also:
Can we have some more complex testing with more levels of nesting and to also have same tables reused, e.g.:

SELECT ....
FROM t1 JOIN (t2 JOIN (t3 JOIN t1 ON ...) t3 ON ...)

@mkleen
Copy link
Contributor Author

mkleen commented Mar 7, 2023

Thanks for the feedback. I am wondering if it maybe makes sense to split this into two prs. One to resolve the nested join issue, the other one for using.

@matriv
Copy link
Contributor

matriv commented Mar 7, 2023

Thanks for the feedback. I am wondering if it maybe makes sense to split this into two prs. One to resolve the nested join issue, the other one for using.

If it's not too much trouble, why not, we can have 2 separate changelog entries as well.

@mkleen
Copy link
Contributor Author

mkleen commented Mar 7, 2023

So the initial issue was the following query, which works now.

SELECT 
  * 
FROM 
  j1 
  JOIN (
    j2 
    JOIN j3 ON j2.x = j3.x
  ) ON j1.x = j2.x

However, we can currently not go infinitely nested. The following query does not work and I would consider this out-of-scope for a fix. This would require significant changes in the way joins are treated internally. This is anyway not a common use case.

SELECT 
  y, 
  z 
FROM 
  a 
  JOIN (
    b 
    JOIN (
      c 
      JOIN a ON c.z = a.x
    ) temp1 ON b.y = temp1.x
  ) temp2 on a.x = temp2.x;

My suggestion is to throw a reasonable error message and leave it as is. @matriv @mfussenegger @BaurzhanSakhariev

@matriv
Copy link
Contributor

matriv commented Mar 7, 2023

It's fine to address the most complex nesting later as a feature.
For the simpler nesting can we do something like this?

FROM 
  j1 
  JOIN (
    j2 
    JOIN j3 ON j2.x = j1.x AND j3.x = j2.x
  ) ON j1.x = j2.x

@mkleen mkleen force-pushed the mkleen/join-ordering-v4 branch 6 times, most recently from ba2e368 to 54ed6ad Compare March 14, 2023 09:46
@mkleen
Copy link
Contributor Author

mkleen commented Mar 14, 2023

Ready for another review. I split this pr now. The join using stuff goes into a follow-up.

Copy link
Contributor

@BaurzhanSakhariev BaurzhanSakhariev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
Thanks! Added a minor comment

Copy link
Contributor

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Very nice to have a nice error message for the unsupported cases, instead of an internal error.

  • Left some very minor comments, no need to address
  • Does it worth mentioning the improved error message in the changelog entry?

This makes nested joins work such as:

SELECT *
    FROM j1
    JOIN (j2 JOIN j3 ON j2.x = j3.x)
    ON j1.x = j2.x;

by adding a JoinRelation which is used to determine the correct JoinPairs.

Fixes #13503
@mkleen mkleen force-pushed the mkleen/join-ordering-v4 branch from fbbde00 to c32ddf8 Compare March 14, 2023 11:28
@mkleen mkleen added the ready-to-merge Let Mergify merge the PR once approved and checks pass label Mar 14, 2023
@mergify mergify bot merged commit e9624a9 into master Mar 14, 2023
@mergify mergify bot deleted the mkleen/join-ordering-v4 branch March 14, 2023 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-to-merge Let Mergify merge the PR once approved and checks pass v/5.2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Nested JOIN leads to IllegalStateException: joinPairs contains duplicate

4 participants