Skip to content

Allow table in-out functions to be used in correlated subqueries and as LATERAL queries#5485

Merged
Mytherin merged 16 commits intoduckdb:masterfrom
Mytherin:tableinoutlateral
Dec 21, 2022
Merged

Allow table in-out functions to be used in correlated subqueries and as LATERAL queries#5485
Mytherin merged 16 commits intoduckdb:masterfrom
Mytherin:tableinoutlateral

Conversation

@Mytherin
Copy link
Collaborator

This PR allows table in-out functions to be used in correlated subqueries or as part of LATERAL joins, i.e. the following query now works:

D SELECT * FROM (SELECT ARRAY[1, 2, 3]) t(l), UNNEST(l) t2(k) ORDER BY k;
┌───────────┬───────┐
│     l     │   k   │
│  int32[]  │ int32 │
├───────────┼───────┤
│ [1, 2, 3] │     1 │
│ [1, 2, 3] │     2 │
│ [1, 2, 3] │     3 │
└───────────┴───────┘

For this to work, table in-out functions need to (partially) replicate their input. Currently this is implemented in a generic manner by running the table in-out function with a single input row at a time, and recording all output of that row with the single input. While this works fine with input rows that have a lot of output (e.g. large lists) it does partially negate the benefits of vectorized processing when dealing with many small lists. We will likely want to create specialized function implementations to make this more efficient in the future.

Table in-out functions vs standard table functions

Note that this does not allow all table functions to work as lateral functions - specifically this only works with table in, table out functions. Most of our table functions are "standard" table functions as that was the only table function we had until recently. Standard table functions can only take constant parameters and hence do not support operating on arbitrary rows.

It is feasible to convert many of our standard table functions to table in, table out functions (such as generate_series, repeat, etc). However, functions such as read_csv or read_parquet learn about their schema by looking at the input parameters and reading the given files, and hence will not be as straightforward, as we do not know which files to read if the input parameters are not constant expressions. Cross that bridge when we come to it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant