Skip to content

[Table In-Out Function] Add the option to write the mapping of output to input row#6636

Closed
Tishj wants to merge 26 commits intoduckdb:mainfrom
Tishj:scalar_table_function
Closed

[Table In-Out Function] Add the option to write the mapping of output to input row#6636
Tishj wants to merge 26 commits intoduckdb:mainfrom
Tishj:scalar_table_function

Conversation

@Tishj
Copy link
Contributor

@Tishj Tishj commented Mar 9, 2023

When dealing with table in-out functions decisions are made based on the input, so we can't know beforehand how many tuples a given input tuple will produce.

This PR adds the ability for the table in-out function to write an extra UINT32_T vector, that contains the index of the input tuple that produced the tuple.

In the physical tableinout operator we can use this information to avoid resorting to tuple-at-a-time execution when the input needs to be projected into the output tuples (which is necessary for some operators, like LATERAL joins)

Added a benchmark to test the performance against the old implementation (that is still there for functions that don't produce the mapping)
current master:

benchmark/micro/join/lateraljoin_unnest.benchmark       1       0.947857
benchmark/micro/join/lateraljoin_unnest.benchmark       2       0.983878
benchmark/micro/join/lateraljoin_unnest.benchmark       3       0.963106
benchmark/micro/join/lateraljoin_unnest.benchmark       4       0.966599
benchmark/micro/join/lateraljoin_unnest.benchmark       5       0.990850

benchmark/micro/join/lateraljoin_unnest.benchmark       1       1.003941
benchmark/micro/join/lateraljoin_unnest.benchmark       2       1.013164
benchmark/micro/join/lateraljoin_unnest.benchmark       3       1.036013
benchmark/micro/join/lateraljoin_unnest.benchmark       4       0.983405
benchmark/micro/join/lateraljoin_unnest.benchmark       5       0.986953

this branch:

benchmark/micro/join/lateraljoin_unnest.benchmark       1       0.870611
benchmark/micro/join/lateraljoin_unnest.benchmark       2       0.896537
benchmark/micro/join/lateraljoin_unnest.benchmark       3       0.973802
benchmark/micro/join/lateraljoin_unnest.benchmark       4       0.986397
benchmark/micro/join/lateraljoin_unnest.benchmark       5       0.988581

benchmark/micro/join/lateraljoin_unnest.benchmark       1       0.918688
benchmark/micro/join/lateraljoin_unnest.benchmark       2       0.894599
benchmark/micro/join/lateraljoin_unnest.benchmark       3       0.967989
benchmark/micro/join/lateraljoin_unnest.benchmark       4       0.938711
benchmark/micro/join/lateraljoin_unnest.benchmark       5       0.942270

Note: the similarity in these results is also caused by the current implementation of UNNEST only unnesting a single row at a time, so the strength of this change - being able to deal with outputs from multiple input tuples at once - are not utilized

@Tishj
Copy link
Contributor Author

Tishj commented Mar 9, 2023

While writing this, I realized I need to do some extra work on the vector -> selection vector transformation.
Currently I'm assuming this to be a FLAT vector, so we can directly use the buffer and then Flatten the resulting projected vectors.
But since this DataChunk is given to the user, they can do whatever they want with it.

If the extra vector is returned to us as a Constant Vector, we can use this to avoid the selection vector + flatten step.
If it's not a Flat vector, we need to create a Selection Vector, but can avoid Flatten-ing the result.

Tishj added 3 commits March 9, 2023 11:44
… vector type of produced mapping vector in UNNEST to constant - since we only emit one unnested row at a time
Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Looks good. Some comments:

@Tishj
Copy link
Contributor Author

Tishj commented Mar 14, 2023

Ah great.. my clang-format is too new for the CI, it's failing the formatter on changes suggested by my clang-format

@github-actions github-actions bot marked this pull request as draft August 3, 2023 17:12
@hannes
Copy link
Member

hannes commented Aug 16, 2023

Hey @Tishj is something you'd like to merge? If so, could you un-draft it?

@Tishj
Copy link
Contributor Author

Tishj commented Aug 16, 2023

@hannes It is, but the options this opens up aren't really utilized in this branch, that's why the benchmark looks quite pitiful.
So I'm hesitant to merge it currently because it's not stress tested enough.

@Mytherin
Copy link
Collaborator

Mytherin commented Nov 8, 2023

As this seems to be superseded by #9014, I will close this one for now

@Mytherin Mytherin closed this Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants