Skip to content

Conversation

@yyy1000
Copy link
Contributor

@yyy1000 yyy1000 commented May 28, 2024

Which issue does this PR close?

Closes #10663.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the sql SQL Planner label May 28, 2024
Comment on lines +290 to +298
match &on.sort_expr {
Some(sort_expr) => {
if let Some(query_ref) = query {
query_ref
.order_by(self.sort_to_sql(sort_expr.clone())?);
} else {
return internal_err!(
"Sort operator only valid in a statement context."
);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When implementing order by in a distinct on.
I found the order by should be added to query instead of select.
So my question is, what's the usage of sort_by in SelectBuilder? I searched the code but not found an usage. 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is to support the postgres style DISTINCT ON, which may not have all the columns of the select list

Perhaps @gruuya knows more

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order_by/sort_by clause is fetched from the sqlparser::ast::Query struct while parsing the DISTINCT ON query, so it makes sense that it is added there when de-parsing

if let LogicalPlan::Distinct(Distinct::On(ref distinct_on)) = plan {
// In case of `DISTINCT ON` we must capture the sort expressions since during the plan
// optimization we're effectively doing a `first_value` aggregation according to them.
let distinct_on = distinct_on.clone().with_sort_expr(order_by)?;
Ok(LogicalPlan::Distinct(Distinct::On(distinct_on)))

Indeed sqlparser::ast::Selects sort_by is not being used in DataFusion

if !select.sort_by.is_empty() {
return not_impl_err!("SORT BY");
}

so it also seems that SelectBuilder::sort_by is not needed atm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Yeah, by running test I found sort_expr will be added to distinct_on plan, but it will not be added to sort.🤔
Do you think it's better to remove SelectBuilder::sort_by or keep it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's better to remove SelectBuilder::sort_by or keep it?

If you're asking me I think it's fine to keep it (though other projects wouldn't be using it as well since SelectBuilder is pub(super), so it wouldn't hurt to remove it either).

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @yyy1000 -- this looks very close. I had one additional test I think we should add but otherwise this looks ready to go to me

Comment on lines +290 to +298
match &on.sort_expr {
Some(sort_expr) => {
if let Some(query_ref) = query {
query_ref
.order_by(self.sort_to_sql(sort_expr.clone())?);
} else {
return internal_err!(
"Sort operator only valid in a statement context."
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is to support the postgres style DISTINCT ON, which may not have all the columns of the select list

Perhaps @gruuya knows more

"select CAST(id/2 as VARCHAR) NOT LIKE 'foo*' from person where NOT EXISTS (select ta.j1_id, tb.j2_string from j1 ta join j2 tb on (ta.j1_id = tb.j2_id))",
r#"select "First Name" from person_quoted_cols"#,
"select DISTINCT id FROM person;",
"select DISTINCT on (id) id, first_name from person order by id;",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also please add a test like this (without an order by)?

select DISTINCT on (id) id, first_name from person;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thanks for your review. :)

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @yyy1000 and @gruuya

@alamb alamb merged commit 156a525 into apache:main May 28, 2024
@yyy1000 yyy1000 deleted the distinct branch May 29, 2024 03:00
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
* support distinct

* cargo fmt

* better fmt

* add support for order by

* add another test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sql SQL Planner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Suport unparsing LogicalPlan::Distinct to DISTINCT

3 participants