Skip to content

Conversation

@phillipleblanc
Copy link

Which issue does this PR close?

The DataFusion SQL Unparser does not correctly roundtrip SQL queries that use UNION rather than UNION ALL. For example the following query:

SELECT col1 FROM footable
UNION
SELECT col1 FROM bartable

Should result in a query that filters out duplicate rows from the final result. DataFusion handles this by adding a LogicalPlan::Distinct node as the parent of the LogicalPlan::Union node.

Distinct:
  Union:
    TableScan
    TableScan

However, this is currently unparsed to the following SQL:

SELECT col1 FROM footable
UNION ALL
SELECT col1 FROM bartable

That will cause incorrect results when executed, because the duplicate rows will not be filtered out.

  • Closes #.

Rationale for this change

Adds support for correctly round-tripping SQL queries that rely on the distinct properties of the UNION set operator.

What changes are included in this PR?

Keeps track of whether a Distinct node is the direct parent of a Union node in the Unparser's QueryBuilder. If so, then when the Union node is being processed the SetModified is set to None rather than All. That will cause the sqlparser AST to correctly emit the UNION SQL operator.

Are these changes tested?

Yes, I've added a new SQL roundtrip test that would have failed previously.

Are there any user-facing changes?

No

@phillipleblanc phillipleblanc self-assigned this Apr 22, 2025
@phillipleblanc phillipleblanc merged commit bfde649 into spiceai-45 Apr 22, 2025
@phillipleblanc phillipleblanc deleted the phillip/250422-unparse-union-distinct branch April 22, 2025 16:12
phillipleblanc added a commit that referenced this pull request Apr 25, 2025
UPSTREAM NOTE: This was submitted upstream:
apache#15814 and will be in DF 48
phillipleblanc added a commit that referenced this pull request May 7, 2025
UPSTREAM NOTE: This was submitted upstream:
apache#15814 and will be in DF 48
sgrebnov pushed a commit that referenced this pull request May 22, 2025
UPSTREAM NOTE: This was submitted upstream:
apache#15814 and will be in DF 48
sgrebnov pushed a commit that referenced this pull request May 26, 2025
UPSTREAM NOTE: This was submitted upstream:
apache#15814 and will be in DF 48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants