[SPARK-46331][SQL] Removing CodegenFallback from subset of DateTime expressions and version() expression#44261
[SPARK-46331][SQL] Removing CodegenFallback from subset of DateTime expressions and version() expression#44261dbatomic wants to merge 22 commits intoapache:masterfrom
Conversation
| @@ -79,30 +79,22 @@ class DateExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { | |||
| } | |||
|
|
|||
| test("datetime function current_date") { | |||
There was a problem hiding this comment.
This test and below are supposed to be positive tests. Do we have positive tests for the functions current_date(), current_timestamp(), localtimestamp() in Spark code base?
There was a problem hiding this comment.
Actually these tests were not useful at all previously, as they are testing dead code. We should check ComputeCurrentTimeSuite and make sure it has good test coverage for the execution part.
|
After looking at test failures, there seem to be 'minor' correctness bug for time aware expressions that rely on select b from values (
'one', current_timestamp),
'two', current_timestamp),
'three', current_timestamp as data(a, b)current_timestamp will not be replaced. Instead, it will be evaluated every time which means that these three rows will have different values (tested locally). This happens because @cloud-fan and @MaxGekk as FYI. |
@dbatomic Could you open a separate PR with a bug fix. In this way, we can backport it without your changes in this PR. |
dtenedor
left a comment
There was a problem hiding this comment.
In general this approach looks good, I just have a couple comments.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveInlineTables.scala
Outdated
Show resolved
Hide resolved
...catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtils.java
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveInlineTables.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveInlineTables.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
Outdated
Show resolved
Hide resolved
...talyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FoldablePropagationSuite.scala
Outdated
Show resolved
Hide resolved
|
LGTM except for #44261 (comment) . Can we verify the test coverage for current datetime functions in |
Hopefully everything looks fine now. |
| val t2 = UnixTimestamp( | ||
| CurrentTimestamp(), Literal("yyyy-MM-dd HH:mm:ss")).eval().asInstanceOf[Long] | ||
| CurrentTimestamp, Literal("yyyy-MM-dd HH:mm:ss")).eval().asInstanceOf[Long] | ||
| assert(t2 - t1 <= 1) |
There was a problem hiding this comment.
This doesn't test anything now as we are testing the same CurrentTimestamp instance. Shall we move this to ComputeCurrentTimeSuite?
...catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
Outdated
Show resolved
Hide resolved
beliefer
left a comment
There was a problem hiding this comment.
LGTM except one comment and @cloud-fan 's comment.
...catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
Show resolved
Hide resolved
| val t1 = DateTimeUtils.microsToMillis( | ||
| literals[Long](planT1)(0)) | ||
|
|
||
| val inT2 = Project(Alias(CurrentTimestamp(), "t1")() :: Nil, LocalRelation()) |
There was a problem hiding this comment.
do we need inT2 as it's exactly the same as inT1? Seems we can just do
val planT2 = Optimize.execute(inT1.analyze).asInstanceOf[Project]
| checkLiterals({ zoneId: String => CurrentDate(Some(zoneId)) }, numTimezones) | ||
| } | ||
|
|
||
| test("datetime function CurrentDate and LocalTime are Unevaluable") { |
There was a problem hiding this comment.
This does not test the optimizer rule and we should keep it in DateExpressionsSuite
|
thanks, merging to master! |
…ime functions ### What changes were proposed in this pull request? This is a followup of #44261 to fix one regression: we missed one exception that time travel needs to evaluate current datetime functions before the analysis phase completes. This PR fixes it by explicitly compute current datetime values in the time travel handling. ### Why are the changes needed? re-support current datetime functions as time travel timestamp ### Does this PR introduce _any_ user-facing change? no, the regression has not been released yet. ### How was this patch tested? new test ### Was this patch authored or co-authored using generative AI tooling? no Closes #45476 from cloud-fan/time-travel. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
<!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> #### Which Delta project/connector is this regarding? <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [x] Spark - [ ] Standalone - [ ] Flink - [ ] Kernel - [ ] Other (fill in here) ## Description <!-- - Describe what this PR changes. - Describe why we need the change. If this PR resolves an issue be sure to include "Resolves #XXX" to correctly link and close the issue upon merge. --> Heading towards the removal of codegenFallback from Date/Time expressions (apache/spark#44261), delta constraints need to resolve current_datetime expressions during the analysis of the invariants. The proposed changes work for both Spark 3.5 and Spark master. ## How was this patch tested? <!-- If tests were added, say they were added here. Please make sure to test the changes thoroughly including negative and positive cases if possible. If the changes were tested in any way other than unit tests, please clarify how you tested step by step (ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future). If the changes were not tested, please explain why. --> Existing tests. In particular, `CheckConstaintsSuite` covers constraints with `current_timestamp()` expressions. Added an extra test to cover `current_date()`. ## Does this PR introduce _any_ user-facing changes? <!-- If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible. If possible, please also clarify if this is a user-facing change compared to the released Delta Lake versions or within the unreleased branches such as master. If no, write 'No'. --> No.
### What changes were proposed in this pull request? This PR is a followed of [this](#44261) PR that made "compute current time" family of expressions `Unevaluable`. Implicitly, all `Unevaluable` expressions are also not foldable. Hence, we have a regression comparing to state prior to 44261 where "compute current time" expressions could have been used in places where constant folding is required. Proposed change is to keep these expression `Unevaluable` in a sense that `eval`/`codeGen` can't be called but to allow folding. This is a special case given that these expressions are supposed to be replaced by QO with literals (that are foldable) so proposal is to create new interface for this family of expressions. ### Why are the changes needed? Explained above. ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? Additional test is added that ensures that `CurrentDate` can be used in places that require folding. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46120 from dbatomic/constantfoldingforcurrenttime. Lead-authored-by: Aleksandar Tomic <aleksandar.tomic@databricks.com> Co-authored-by: Aleksandar Tomic <150942779+dbatomic@users.noreply.github.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Fix GlutenDateExpressionsSuite in Spark-4.0 apache/spark#44261 make CurrentTimestamp as Unevaluable.
What changes were proposed in this pull request?
This PR moves us a bit closer to removing CodegenFallback class and instead of it relying on RuntimeReplaceable with StaticInvoke.
In this PR there are following changes:
Why are the changes needed?
Direction is to get away from CodegenFallback. This PR moves us closer to that destination.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Running existing tests.
Was this patch authored or co-authored using generative AI tooling?
No