Skip to content

Table references in subqueries cause invalid references in top-level projection #13027

@peasee

Description

@peasee

Describe the bug

SQL with table references in a subquery:

SELECT j1_id FROM (SELECT ta.j1_id AS j1_id FROM j1 ta)

gets re-written in both GenericDialect and MySqlDialect with a table reference that does not exist.

GenericDialect:

SELECT ta.j1_id FROM (SELECT ta.j1_id FROM j1 AS ta)

MySqlDialect:

SELECT `ta`.`j1_id` FROM (SELECT `ta`.`j1_id` FROM `j1` AS `ta`) AS `derived_projection`

For Datafusion, the behavior of GenericDialect still executes. However, DuckDB with GenericDialect and MySQL with MySqlDialect both fail. I have not tested additional databases.

To Reproduce

Run a query that contains a subquery with a table reference that is not used in the top-level projection, like:

SELECT j1_id FROM (SELECT ta.j1_id AS j1_id FROM j1 ta)

Datafusion will re-write it and include the ta table reference in the top-level project, which does not exist outside of the subquery.

Expected behavior

Datafusion should probably remove the non-existent table reference from the top-level project. Alternatively, make subquery/derived table aliases mandatory for all dialects and re-write the table reference to the alias of the subquery like derived_projection.j1_id.

Additional context

Discussion where the bug was originally identified: #12994

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions