-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Support LogicalPlan::Distinct in unparser #10690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| match &on.sort_expr { | ||
| Some(sort_expr) => { | ||
| if let Some(query_ref) = query { | ||
| query_ref | ||
| .order_by(self.sort_to_sql(sort_expr.clone())?); | ||
| } else { | ||
| return internal_err!( | ||
| "Sort operator only valid in a statement context." | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When implementing order by in a distinct on.
I found the order by should be added to query instead of select.
So my question is, what's the usage of sort_by in SelectBuilder? I searched the code but not found an usage. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is to support the postgres style DISTINCT ON, which may not have all the columns of the select list
Perhaps @gruuya knows more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order_by/sort_by clause is fetched from the sqlparser::ast::Query struct while parsing the DISTINCT ON query, so it makes sense that it is added there when de-parsing
datafusion/datafusion/sql/src/query.rs
Lines 132 to 136 in 3dc1773
| if let LogicalPlan::Distinct(Distinct::On(ref distinct_on)) = plan { | |
| // In case of `DISTINCT ON` we must capture the sort expressions since during the plan | |
| // optimization we're effectively doing a `first_value` aggregation according to them. | |
| let distinct_on = distinct_on.clone().with_sort_expr(order_by)?; | |
| Ok(LogicalPlan::Distinct(Distinct::On(distinct_on))) |
Indeed sqlparser::ast::Selects sort_by is not being used in DataFusion
datafusion/datafusion/sql/src/select.rs
Lines 68 to 70 in 3dc1773
| if !select.sort_by.is_empty() { | |
| return not_impl_err!("SORT BY"); | |
| } |
so it also seems that
SelectBuilder::sort_by is not needed atm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. Yeah, by running test I found sort_expr will be added to distinct_on plan, but it will not be added to sort.🤔
Do you think it's better to remove SelectBuilder::sort_by or keep it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's better to remove
SelectBuilder::sort_byor keep it?
If you're asking me I think it's fine to keep it (though other projects wouldn't be using it as well since SelectBuilder is pub(super), so it wouldn't hurt to remove it either).
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @yyy1000 -- this looks very close. I had one additional test I think we should add but otherwise this looks ready to go to me
| match &on.sort_expr { | ||
| Some(sort_expr) => { | ||
| if let Some(query_ref) = query { | ||
| query_ref | ||
| .order_by(self.sort_to_sql(sort_expr.clone())?); | ||
| } else { | ||
| return internal_err!( | ||
| "Sort operator only valid in a statement context." | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is to support the postgres style DISTINCT ON, which may not have all the columns of the select list
Perhaps @gruuya knows more
| "select CAST(id/2 as VARCHAR) NOT LIKE 'foo*' from person where NOT EXISTS (select ta.j1_id, tb.j2_string from j1 ta join j2 tb on (ta.j1_id = tb.j2_id))", | ||
| r#"select "First Name" from person_quoted_cols"#, | ||
| "select DISTINCT id FROM person;", | ||
| "select DISTINCT on (id) id, first_name from person order by id;", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also please add a test like this (without an order by)?
select DISTINCT on (id) id, first_name from person;There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added, thanks for your review. :)
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* support distinct * cargo fmt * better fmt * add support for order by * add another test
Which issue does this PR close?
Closes #10663.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?