Allow table in-out functions to be used in correlated subqueries and as LATERAL queries#5485
Merged
Mytherin merged 16 commits intoduckdb:masterfrom Dec 21, 2022
Merged
Conversation
…table function that does not support projection pushdown
…nformation gathering for cached operators
This was referenced Nov 29, 2022
2 tasks
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR allows table in-out functions to be used in correlated subqueries or as part of LATERAL joins, i.e. the following query now works:
For this to work, table in-out functions need to (partially) replicate their input. Currently this is implemented in a generic manner by running the table in-out function with a single input row at a time, and recording all output of that row with the single input. While this works fine with input rows that have a lot of output (e.g. large lists) it does partially negate the benefits of vectorized processing when dealing with many small lists. We will likely want to create specialized function implementations to make this more efficient in the future.
Table in-out functions vs standard table functions
Note that this does not allow all table functions to work as lateral functions - specifically this only works with table in, table out functions. Most of our table functions are "standard" table functions as that was the only table function we had until recently. Standard table functions can only take constant parameters and hence do not support operating on arbitrary rows.
It is feasible to convert many of our standard table functions to table in, table out functions (such as
generate_series,repeat, etc). However, functions such asread_csvorread_parquetlearn about their schema by looking at the input parameters and reading the given files, and hence will not be as straightforward, as we do not know which files to read if the input parameters are not constant expressions. Cross that bridge when we come to it :)