-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Enhance: simplify x=x --> x IS NOT NULL OR NULL
#15589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This optimizer transformation doesn't look right for projected columns like However it makes sense if it is only applied on filter expressions. |
Thank you for review. I'll make the change |
18e4bf0 to
7a17897
Compare
|
@2010YOUY01 Instead of applying transformation on filter expression, I adjusted the rule to transform x=x into About the performance, I'm not 100% sure whether this rule worth the change, but I made this change because |
I recommend writing a simple benchmark by criterion or other tools to get more reliable results instead of measuring the CLI execution. But I agree that this should be more performant |
|
We could also use a CASE CASE x IS NOT NULL THEN true ELSE null END |
|
I expect this to make a large performance difference when x is a string type (as string comparisons are fairly expensive) Thank you for this PR @ding-young and the great reviews @2010YOUY01 and @berkaysynnada |
|
I wrote a simple criterion bench (https://github.com/ding-young/datafusion/blob/simplify-trivial-eq-bench/datafusion/core/benches/simplify_trivial_eq.rs) and the result is as follows. Plus, since Thanks for the review! Trying out the bench was also a great experience for me. |
Thank you for sharing the results. It doesn't effect much here but next time you can exclude the create_context() part out of bench_function :) |
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs
Outdated
Show resolved
Hide resolved
7a17897 to
ce1089b
Compare
kosiew
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 👍 @ding-young
Left a few suggestions.
ce1089b to
44906b2
Compare
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ding-young @kosiew @2010YOUY01 and @berkaysynnada -- go team 🦾 !
| 06)----------Filter: __common_expr_3 = __common_expr_3 | ||
| 07)------------Projection: substr(CAST(md5(CAST(tmp_table.value AS Utf8)) AS Utf8), Int64(1), Int64(32)) AS __common_expr_3 | ||
| 08)--------------TableScan: tmp_table projection=[value] | ||
| 05)--------Projection: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a comment a few lines above that says:
TODO: this should probably be possible to completely remove the filter as always true?
We can probably update that too -- but we could do it in a follow on PR as well
x=x --> x IS NOT NULL OR NULL
- if x is not nullable, x=x -> true - else, x=x -> x is NOT NULL OR NULL
44906b2 to
115c8a2
Compare
| # The optimizer does not currently eliminate the filter; | ||
| # Instead, it's rewritten as `IS NULL OR NOT NULL` due to SQL null semantics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the comment in slt as well :)
|
Thanks again @ding-young 🙏 |
- if x is not nullable, x=x -> true - else, x=x -> x is NOT NULL OR NULL

Which issue does this PR close?
Rationale for this change
What changes are included in this PR?
This pr adds a rule in ExprSimplifier to simplify x=x as follows.
Are these changes tested?
Yes, via unit test and slt
Are there any user-facing changes?