Skip to content

[SPARK-46331][SQL] Removing CodegenFallback from subset of DateTime expressions and version() expression#44261

Closed
dbatomic wants to merge 22 commits intoapache:masterfrom
dbatomic:codegenfallback_removal
Closed

[SPARK-46331][SQL] Removing CodegenFallback from subset of DateTime expressions and version() expression#44261
dbatomic wants to merge 22 commits intoapache:masterfrom
dbatomic:codegenfallback_removal

Conversation

@dbatomic
Copy link
Copy Markdown
Contributor

@dbatomic dbatomic commented Dec 8, 2023

What changes were proposed in this pull request?

This PR moves us a bit closer to removing CodegenFallback class and instead of it relying on RuntimeReplaceable with StaticInvoke.

In this PR there are following changes:

  • Doing StaticInvoke + RuntimeReplaceable against spark version expression.
  • Adding Unevaluable trait for DateTime expressions. These expressions need to be replaced during analysis anyhow so we explicitly forbid eval from being called.

Why are the changes needed?

Direction is to get away from CodegenFallback. This PR moves us closer to that destination.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Running existing tests.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Dec 8, 2023
@@ -79,30 +79,22 @@ class DateExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
}

test("datetime function current_date") {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test and below are supposed to be positive tests. Do we have positive tests for the functions current_date(), current_timestamp(), localtimestamp() in Spark code base?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually these tests were not useful at all previously, as they are testing dead code. We should check ComputeCurrentTimeSuite and make sure it has good test coverage for the execution part.

@dbatomic
Copy link
Copy Markdown
Contributor Author

After looking at test failures, there seem to be 'minor' correctness bug for time aware expressions that rely on ReplaceCurrentLike rule. For following expression:

select b from values (
 'one', current_timestamp),
 'two', current_timestamp),
 'three', current_timestamp as data(a, b)

current_timestamp will not be replaced. Instead, it will be evaluated every time which means that these three rows will have different values (tested locally). This happens because ResolveInlineTables rule happens first and it tries to eval all the expressions. IMO, ReplaceCurrentLike should have higher precedence.
I will try to fix this as part of this PR.

@cloud-fan and @MaxGekk as FYI.

@MaxGekk
Copy link
Copy Markdown
Member

MaxGekk commented Dec 12, 2023

I will try to fix this as part of this PR.

@dbatomic Could you open a separate PR with a bug fix. In this way, we can backport it without your changes in this PR.

@dbatomic
Copy link
Copy Markdown
Contributor Author

I will try to fix this as part of this PR.

@dbatomic Could you open a separate PR with a bug fix. In this way, we can backport it without your changes in this PR.

I just created this PR:
#44316

Copy link
Copy Markdown
Contributor

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general this approach looks good, I just have a couple comments.

@cloud-fan
Copy link
Copy Markdown
Contributor

LGTM except for #44261 (comment) . Can we verify the test coverage for current datetime functions in ComputeCurrentTimeSuite?

@dbatomic
Copy link
Copy Markdown
Contributor Author

LGTM except for #44261 (comment) . Can we verify the test coverage for current datetime functions in ComputeCurrentTimeSuite?

Hopefully everything looks fine now.

val t2 = UnixTimestamp(
CurrentTimestamp(), Literal("yyyy-MM-dd HH:mm:ss")).eval().asInstanceOf[Long]
CurrentTimestamp, Literal("yyyy-MM-dd HH:mm:ss")).eval().asInstanceOf[Long]
assert(t2 - t1 <= 1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't test anything now as we are testing the same CurrentTimestamp instance. Shall we move this to ComputeCurrentTimeSuite?

Copy link
Copy Markdown
Contributor

@beliefer beliefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except one comment and @cloud-fan 's comment.

val t1 = DateTimeUtils.microsToMillis(
literals[Long](planT1)(0))

val inT2 = Project(Alias(CurrentTimestamp(), "t1")() :: Nil, LocalRelation())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need inT2 as it's exactly the same as inT1? Seems we can just do

val planT2 = Optimize.execute(inT1.analyze).asInstanceOf[Project]

checkLiterals({ zoneId: String => CurrentDate(Some(zoneId)) }, numTimezones)
}

test("datetime function CurrentDate and LocalTime are Unevaluable") {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not test the optimizer rule and we should keep it in DateExpressionsSuite

@cloud-fan
Copy link
Copy Markdown
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in ee1fd92 Jan 9, 2024
MaxGekk pushed a commit that referenced this pull request Mar 12, 2024
…ime functions

### What changes were proposed in this pull request?

This is a followup of #44261 to fix one regression: we missed one exception that time travel needs to evaluate current datetime functions before the analysis phase completes. This PR fixes it by explicitly compute current datetime values in the time travel handling.

### Why are the changes needed?

re-support current datetime functions as time travel timestamp

### Does this PR introduce _any_ user-facing change?

no, the regression has not been released yet.

### How was this patch tested?

new test

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #45476 from cloud-fan/time-travel.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
tdas pushed a commit to delta-io/delta that referenced this pull request Mar 21, 2024
<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [x] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

## Description

<!--
- Describe what this PR changes.
- Describe why we need the change.
 
If this PR resolves an issue be sure to include "Resolves #XXX" to
correctly link and close the issue upon merge.
-->

Heading towards the removal of codegenFallback from Date/Time
expressions (apache/spark#44261), delta
constraints need to resolve current_datetime expressions during the
analysis of the invariants. The proposed changes work for both Spark 3.5
and Spark master.

## How was this patch tested?

<!--
If tests were added, say they were added here. Please make sure to test
the changes thoroughly including negative and positive cases if
possible.
If the changes were tested in any way other than unit tests, please
clarify how you tested step by step (ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future).
If the changes were not tested, please explain why.
-->
Existing tests. In particular, `CheckConstaintsSuite` covers constraints
with `current_timestamp()` expressions. Added an extra test to cover
`current_date()`.

## Does this PR introduce _any_ user-facing changes?

<!--
If yes, please clarify the previous behavior and the change this PR
proposes - provide the console output, description and/or an example to
show the behavior difference if possible.
If possible, please also clarify if this is a user-facing change
compared to the released Delta Lake versions or within the unreleased
branches such as master.
If no, write 'No'.
-->
No.
cloud-fan pushed a commit that referenced this pull request Apr 22, 2024
### What changes were proposed in this pull request?

This PR is a followed of [this](#44261) PR that made "compute current time" family of expressions `Unevaluable`. Implicitly, all `Unevaluable` expressions are also not foldable. Hence, we have a regression comparing to state prior to 44261 where "compute current time" expressions could have been used in places where constant folding is required.

Proposed change is to keep these expression `Unevaluable` in a sense that `eval`/`codeGen`  can't be called but to allow folding. This is a special case given that these expressions are supposed to be replaced by QO with literals (that are foldable) so proposal is to create new interface for this family of expressions.

### Why are the changes needed?

Explained above.

### Does this PR introduce _any_ user-facing change?

Yes.

### How was this patch tested?

Additional test is added that ensures that `CurrentDate` can be used in places that require folding.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46120 from dbatomic/constantfoldingforcurrenttime.

Lead-authored-by: Aleksandar Tomic <aleksandar.tomic@databricks.com>
Co-authored-by: Aleksandar Tomic <150942779+dbatomic@users.noreply.github.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
zhouyuan pushed a commit to apache/gluten that referenced this pull request Nov 25, 2025
Fix GlutenDateExpressionsSuite in Spark-4.0
apache/spark#44261 make CurrentTimestamp as Unevaluable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants