Support unparsing UNION for distinct results
#75
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
The DataFusion SQL Unparser does not correctly roundtrip SQL queries that use
UNIONrather thanUNION ALL. For example the following query:Should result in a query that filters out duplicate rows from the final result. DataFusion handles this by adding a
LogicalPlan::Distinctnode as the parent of theLogicalPlan::Unionnode.However, this is currently unparsed to the following SQL:
That will cause incorrect results when executed, because the duplicate rows will not be filtered out.
Rationale for this change
Adds support for correctly round-tripping SQL queries that rely on the distinct properties of the
UNIONset operator.What changes are included in this PR?
Keeps track of whether a Distinct node is the direct parent of a Union node in the Unparser's QueryBuilder. If so, then when the Union node is being processed the SetModified is set to
Nonerather thanAll. That will cause the sqlparser AST to correctly emit theUNIONSQL operator.Are these changes tested?
Yes, I've added a new SQL roundtrip test that would have failed previously.
Are there any user-facing changes?
No