-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Support AND operator as alias of array intersect function #8496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: veeupup <code@tanweime.com>
Weijun-H
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @Veeupup
| } | ||
| And => match left_data_type { | ||
| DataType::Boolean => boolean_op!(&left, &right, and_kleene), | ||
| DataType::List(_field) => array_intersect(&[left, right]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| DataType::List(_field) => array_intersect(&[left, right]), | |
| DataType::List(..) => array_intersect(&[left, right]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
| And | Or => if matches!((lhs, rhs), (Boolean | Null, Boolean | Null)) { | ||
| // Logical binary boolean operators can only be evaluated for | ||
| // boolean or null arguments. | ||
| // boolean or null arguments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // boolean or null arguments. | |
| // boolean or null arguments. |
| BinaryOperator::PGBitwiseShiftRight => Ok(Operator::BitwiseShiftRight), | ||
| BinaryOperator::PGBitwiseShiftLeft => Ok(Operator::BitwiseShiftLeft), | ||
| BinaryOperator::StringConcat => Ok(Operator::StringConcat), | ||
| BinaryOperator::PGOverlap => Ok(Operator::And), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Postgres array overlap (&&) returns a boolean, which is not same as array_intersect/list_intersect in Duckdb. But looks like you are treating And (&&) as an alias like Duckdb. Maybe you should not parse a PGOverlap to And.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@viirya I think PGOverlap is just a parsed token. What does this token do (return boolean or list_intersect) should be define latter on. It is fine to convert it to Operator::And.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw PGOverlap does not seem pg specific, although the comment said. It is also parsed in GenericDialect. Is the comment misleading or what? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No...this PR makes AND as an alias operator of array_intersect function when it is applied with List parameters. So this line actually means an BinaryOperator::PGOverlap will be treated as AND operator in DataFusion (i.e., AND in DataFusion is compatible with Postgres' PGOverlap operator).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think PGOverlap is just a parsed token. What does this token do (return boolean or list_intersect) should be define latter on. It is fine to convert it to Operator::And.
Doesn't this PR already define AND as an alias of array_intersect?
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Postgres array overlap is the same as array intersect. Return common elements in two array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure?
https://www.postgresql.org/docs/9.6/functions-array.html
&& | overlap (have elements in common) | ARRAY[1,4,3] && ARRAY[2,1] | t
postgres=# select array[1, 2] && array[3, 4];
?column?
----------
f
(1 row)
postgres=# select array[1, 2, 3] && array[3, 4];
?column?
----------
t
(1 row)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only the return value is different, both find if there is common between two.
However, whether we return array or boolean, we can still convert PGOverlap to Operator::Overlap for this operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the behavior like Duckdb between we can still get the follow on information based on the return value, but we can't find out the common in arrays from boolean value.
| ---- | ||
| [2, 3] [] [aa, cc] [true] [2.2, 3.3] [[2, 2], [3, 3]] | ||
|
|
||
| query ?????? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| query ?????? | |
| # AND operator as alias of array_intersect | |
| query ?????? |
| SELECT [1,2,3] && [2,3,4], | ||
| [1,3,5] && [2,4,6], | ||
| ['aa','bb','cc'] && ['cc','aa','dd'], | ||
| [true, false] && [true], | ||
| [1.1, 2.2, 3.3] && [2.2, 3.3, 4.4], | ||
| [[1, 1], [2, 2], [3, 3]] && [[2, 2], [3, 3], [4, 4]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it is an alias, it is better to add array_intersect alongside to compare its value.
| SELECT [1,2,3] && [2,3,4], | |
| [1,3,5] && [2,4,6], | |
| ['aa','bb','cc'] && ['cc','aa','dd'], | |
| [true, false] && [true], | |
| [1.1, 2.2, 3.3] && [2.2, 3.3, 4.4], | |
| [[1, 1], [2, 2], [3, 3]] && [[2, 2], [3, 3], [4, 4]] | |
| SELECT [1,2,3] && [2,3,4], array_intersect([1,2,3], [2,3,4]), | |
| [1,3,5] && [2,4,6], array_intersect([1,3,5], [2,4,6]), | |
| ['aa','bb','cc'] && ['cc','aa','dd'], array_intersect(['aa','bb','cc'], ['cc','aa','dd']), | |
| [true, false] && [true], array_intersect([true, false], [true]), | |
| [1.1, 2.2, 3.3] && [2.2, 3.3, 4.4], array_intersect([1.1, 2.2, 3.3], [2.2, 3.3, 4.4]), | |
| [[1, 1], [2, 2], [3, 3]] && [[2, 2], [3, 3], [4, 4]], array_intersect([[1, 1], [2, 2], [3, 3]], [[2, 2], [3, 3], [4, 4]],), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you! test has been fixed!
| } else if matches!((lhs, rhs), (List(_), List(_))) { | ||
| Ok(Signature::uniform(lhs.clone())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need to make sure left and right List have same element type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the long term, we should have the same signature as BuiltinFunction::ArrayIntersect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reuse array_coercion to resolve the data type~
| } | ||
| And => match left_data_type { | ||
| DataType::Boolean => boolean_op!(&left, &right, and_kleene), | ||
| DataType::List(_field) => array_intersect(&[left, right]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if DataFusion already has some stuff to handle operator -> function aliasing. At least, we should parse/convert the AND operator to array_intersect function call at early stage, instead of silently turning to call array_intersect in binary physical operator. If we are going to support more aliasing operators like this, we will soon find these aliasing logic scatters everywhere in codebase and hard to track.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good suggestion. Like what we have done for MakeArry in sql_array_literal. I think we can do the similar conversion for other List operator in sql to logicExpr stage. Including array_concat, append prepend, array_has
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about we make this change in the following PR? it seems that we need to refactor array_concat, append prepend, array_has too
Signed-off-by: veeupup <code@tanweime.com>
|
What is the status of this PR? |
|
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
Which issue does this PR close?
Closes #8483 .
Rationale for this change
What changes are included in this PR?
Are these changes tested?
sqllogictest
Are there any user-facing changes?