Skip to content

Cast String to Date ANSI Mode - Spark 3.2 - Mismatch between Spark and Comet Errors #440

@vidyasankarv

Description

@vidyasankarv

Describe the bug

When a String which is an invalid date is cast to a Datetype

In spark 3.2 the error message is

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 2) (192.168.1.10 executor driver): java.time.DateTimeException: Cannot cast 0 to DateType.

In spark 3.3 and above the error message is :

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 2) (192.168.1.10 executor driver): org.apache.spark.SparkDateTimeException: [CAST_INVALID_INPUT] The value '0' of the type "STRING" cannot be cast to "DATE" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.

Currently in Comet the error messages match to spark 3.3 and above

Steps to reproduce

In the CometTestSuite cast StringType to DateType test we have added an assumption for this test to be only running in Spark3.3 and above.
Removing that triggers a test failure when the test suite is run on with the following env jdk-1.8 and spark-3.2.0

Additionally you can reproduce this error locally using spark shell setup with jdk 1.8 and spark 3.2.0

$SPARK_HOME/bin/spark-shell --conf spark.sql.ansi.enabled=true

import org.apache.spark.sql._  
import org.apache.spark.sql.types._  
  
import java.io.File  
import java.nio.file.Files  


  def roundtripParquet(df: DataFrame): DataFrame = {  
    val tempDir = Files.createTempDirectory("spark").toString  
    val filename = new File(tempDir, s"castTest_${System.currentTimeMillis()}.parquet").toString  
    df.write.mode(SaveMode.Overwrite).parquet(filename)  
    spark.read.parquet(filename)  
  }  
  
  import spark.implicits._  
  
  val data = roundtripParquet(Seq("0").toDF("a"))  
  data.createOrReplaceTempView("t")  
  val df = spark.sql(s"select a, cast(a as ${DataTypes.DateType.sql}) from t order by a")  
  df.collect().foreach(println) 

Expected behavior

CometTestSuite cast String to DateType test should pass for spark-3.2.0

Additional context

#383 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions