Relates #138888
While reviewing the planning of unmapped_fields="load" in case of subqueries and FORK, we noticed that
- it needs to be clarified what it means for unmapped field loading when an unmapped field is mentioned in only one branch (Should it also get loaded in the others? Probably yes!), and
- the resolution logic for Fork/UnionAll that aligns the different branches to be able to perform the union (they have to have the same columns, and thus we fill missing columns with nulls) seems to work against the planning for unmapped field loading, leading to bugs.
See comments on analyzer unmapped tests here:
We should wrap this up by adding more spec tests and adding/updating the existing analyzer unmapped test expectations.
Example queries for illustration:
SET unmapped_fields="load";
FROM (employees), (employees | KEEP *, unmapped_field)
It's reasonable to say that unmapped_field should only load in the second subquery; although it implies that the planner knows that employees may have unmapped extra fields and decides not to load them due to subquery isolation.
SET unmapped_fields="load";
FROM employees
| FORK
(WHERE true)
(KEEP *, unmapped_field)
Here, we should load unmapped_field in both branches, because they share the same root. Currently, this is broken.
This is non-trivial. In addition to the fact that we should look at a couple more examples to decide what's right and what's wrong behavior, the UnionAll query plan node that represents branches for subqueries inherits from Fork, and thus the code paths for analyzing these queries are intermixed unless we are very careful.
Relates #138888
While reviewing the planning of
unmapped_fields="load"in case of subqueries and FORK, we noticed thatSee comments on analyzer unmapped tests here:
We should wrap this up by adding more spec tests and adding/updating the existing analyzer unmapped test expectations.
Example queries for illustration:
It's reasonable to say that
unmapped_fieldshould only load in the second subquery; although it implies that the planner knows thatemployeesmay have unmapped extra fields and decides not to load them due to subquery isolation.Here, we should load
unmapped_fieldin both branches, because they share the same root. Currently, this is broken.This is non-trivial. In addition to the fact that we should look at a couple more examples to decide what's right and what's wrong behavior, the
UnionAllquery plan node that represents branches for subqueries inherits fromFork, and thus the code paths for analyzing these queries are intermixed unless we are very careful.