Skip to content

opt: eliminate unnecessary index join inside project#53586

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
mgartner:eliminate-index-join-inside-project
Sep 3, 2020
Merged

opt: eliminate unnecessary index join inside project#53586
craig[bot] merged 1 commit intocockroachdb:masterfrom
mgartner:eliminate-index-join-inside-project

Conversation

@mgartner
Copy link
Copy Markdown
Contributor

This commit adds a new rule, EliminateIndexJoinInsideProject, that
removes an IndexJoin inside a Project when the IndexJoin's input
produces all the columns needed by the Project's passthrough and
projection columns. See the comments for the new rule for a detailed
example.

Fixes #51018

Release justification: This is a low-risk update to new functionality,
partial indexes.

Release note: None

@mgartner mgartner requested a review from a team as a code owner August 28, 2020 00:33
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@mgartner
Copy link
Copy Markdown
Contributor Author

I can't think of a case outside of partial indexes (and inverted indexes, see TODO in diff) where this rule would fire, so I think that makes it low-risk. However, if there are cases where this rule fires and we consider it not low-risk, this can wait until a future release.

@mgartner mgartner force-pushed the eliminate-index-join-inside-project branch from da65da7 to feefd01 Compare August 28, 2020 00:40
@mgartner
Copy link
Copy Markdown
Contributor Author

Note for me: Rebase and fix xform/testdata/rules/limit tests after #53581 is merged.

Copy link
Copy Markdown
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: (but wait for @RaduBerinde to sign off)

It would be cool if we could also have a similar rule that would apply to LookupJoins that are acting like an IndexJoin (or did @DrewKimball already make a rule to eliminate those?). I don't think we should do that now during the stability period, but maybe add a TODO somewhere?

Reviewed 4 of 4 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

@RaduBerinde
Copy link
Copy Markdown
Member

This fires for unnecessary index joins with inverted indexes (as seen in some tests). It looks correct though, because (since very recently) we use a new ColumnID for the index "column".

I am ok with this going in (it's a low risk improvement for both existing and new functionality) , but we may want to add some more specific tests with inverted indexes (and I'm sure there's a TODO about this somewhere that we can remove).

@mgartner
Copy link
Copy Markdown
Contributor Author

It looks correct though, because (since very recently) we use a new ColumnID for the index "column".

@RaduBerinde Can you explain what you mean by this? Why wouldn't this have been safe before this change?

@RaduBerinde
Copy link
Copy Markdown
Member

In principle the scan of an inverted index can return the inverted index key as a column. I am not sure this happens currently for JSON but it can definitely happen for geospatial. Until my recent PR, this "column" had the same ColumnID as the "real" table column. So you could conceivably have an index join that looks unnecessary, but that's actually replacing the incomplete inverted key value with the real column value. I am not 100% sure this would occur in practice currently so maybe there's nothing to do here. I'd just ensure that we have tests like SELECT j FROM t WHERE j <@ ... and those still do an index join.

@mgartner mgartner force-pushed the eliminate-index-join-inside-project branch from feefd01 to 69f3c90 Compare August 31, 2020 23:10
Copy link
Copy Markdown
Contributor Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be cool if we could also have a similar rule that would apply to LookupJoins that are acting like an IndexJoin (or did @DrewKimball already make a rule to eliminate those?). I don't think we should do that now during the stability period, but maybe add a TODO somewhere?

@rytaft do you have an example? I couldn't find any tests that show what you're describing.

In principle the scan of an inverted index can return the inverted index key as a column. I am not sure this happens currently for JSON but it can definitely happen for geospatial. Until my recent PR, this "column" had the same ColumnID as the "real" table column. So you could conceivably have an index join that looks unnecessary, but that's actually replacing the incomplete inverted key value with the real column value. I am not 100% sure this would occur in practice currently so maybe there's nothing to do here. I'd just ensure that we have tests like SELECT j FROM t WHERE j <@ ... and those still do an index join.

Got it. I've added a test with a comment to make this case explicit. I don't think it was possible because inverted index scans seem to "know" that they can only output the primary key columns, and not the column they index.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)

@rytaft
Copy link
Copy Markdown
Collaborator

rytaft commented Sep 1, 2020

do you have an example?

Hmm I thought that maybe there might be a case from the GenerateLookupJoins rule, since the final index join (if needed) is actually a lookup join with the primary index. But I can't find a specific example right now where we apply the lookup join and it's not needed.

@mgartner mgartner force-pushed the eliminate-index-join-inside-project branch from 69f3c90 to 0e611f9 Compare September 1, 2020 19:01
@mgartner
Copy link
Copy Markdown
Contributor Author

mgartner commented Sep 1, 2020

Hmm I thought that maybe there might be a case from the GenerateLookupJoins rule, since the final index join (if needed) is actually a lookup join with the primary index. But I can't find a specific example right now where we apply the lookup join and it's not needed.

I'll keep it in mind and if I come across a case we can add support for it. Without an existing test case, I'm not sure where I would put a TODO, so I'm inclined to forgo it for now.

@rytaft
Copy link
Copy Markdown
Collaborator

rytaft commented Sep 1, 2020

Sounds good to me

@mgartner mgartner requested a review from RaduBerinde September 1, 2020 22:32
This commit adds a new rule, EliminateIndexJoinInsideProject, that
removes an IndexJoin inside a Project when the IndexJoin's input
produces all the columns needed by the Project's passthrough and
projection columns. See the comments for the new rule for a detailed
example.

Release justification: This is a low-risk update to new functionality,
partial indexes.

Release note: None
@mgartner mgartner force-pushed the eliminate-index-join-inside-project branch from 0e611f9 to ae2a9e0 Compare September 2, 2020 16:17
@mgartner
Copy link
Copy Markdown
Contributor Author

mgartner commented Sep 2, 2020

but we may want to add some more specific tests with inverted indexes (and I'm sure there's a TODO about this somewhere that we can remove).

Done.

@mgartner
Copy link
Copy Markdown
Contributor Author

mgartner commented Sep 3, 2020

@RaduBerinde Do you sign-off on this?

Copy link
Copy Markdown
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mgartner
Copy link
Copy Markdown
Contributor Author

mgartner commented Sep 3, 2020

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Sep 3, 2020

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

opt: unconstrained partial index scans should not add index joins when remaining filters prune columns

4 participants