[Table In-Out Function] Add the option to write the mapping of output to input row#6636
[Table In-Out Function] Add the option to write the mapping of output to input row#6636Tishj wants to merge 26 commits intoduckdb:mainfrom
Conversation
…to re-use a datachunk across multiple runs
|
While writing this, I realized I need to do some extra work on the vector -> selection vector transformation. If the extra vector is returned to us as a Constant Vector, we can use this to avoid the selection vector + flatten step. |
… vector type of produced mapping vector in UNNEST to constant - since we only emit one unnested row at a time
Mytherin
left a comment
There was a problem hiding this comment.
Thanks for the PR! Looks good. Some comments:
src/execution/operator/projection/physical_tableinout_function.cpp
Outdated
Show resolved
Hide resolved
src/execution/operator/projection/physical_tableinout_function.cpp
Outdated
Show resolved
Hide resolved
src/execution/operator/projection/physical_tableinout_function.cpp
Outdated
Show resolved
Hide resolved
|
Ah great.. my clang-format is too new for the CI, it's failing the formatter on changes suggested by my clang-format |
…TableInOutFunction::FinalExecute
|
Hey @Tishj is something you'd like to merge? If so, could you un-draft it? |
|
@hannes It is, but the options this opens up aren't really utilized in this branch, that's why the benchmark looks quite pitiful. |
|
As this seems to be superseded by #9014, I will close this one for now |
When dealing with table in-out functions decisions are made based on the input, so we can't know beforehand how many tuples a given input tuple will produce.
This PR adds the ability for the table in-out function to write an extra UINT32_T vector, that contains the index of the input tuple that produced the tuple.
In the physical tableinout operator we can use this information to avoid resorting to tuple-at-a-time execution when the input needs to be projected into the output tuples (which is necessary for some operators, like LATERAL joins)
Added a benchmark to test the performance against the old implementation (that is still there for functions that don't produce the mapping)
current master:
this branch:
Note: the similarity in these results is also caused by the current implementation of
UNNESTonly unnesting a single row at a time, so the strength of this change - being able to deal with outputs from multiple input tuples at once - are not utilized