Skip to content

Comet can produce different results to Spark when averaging a decimal #1354

@andygrove

Description

@andygrove

Describe the bug

Given the following SQL, where c1 is a tinyint and c7 is a decimal(14,6):

SELECT c1, Avg(c7) FROM t1 GROUP BY c1 ORDER BY c1

Some results are different between Spark and Comet, perhaps due to a decimal promotion or rounding difference.

!== Correct Answer - 256 ==                 == Spark Answer - 256 ==
 struct<c1:tinyint,avg(c7):decimal(14,6)>   struct<c1:tinyint,avg(c7):decimal(14,6)>
 [68,0.595938]                              [68,0.595938]
![69,0.520313]                              [69,0.520312]
 [70,0.498929]                              [70,0.498929]

Steps to reproduce

  test("avg decimal") {
    withTempDir { dir =>
      val path = new Path(dir.toURI.toString, "test.parquet")
      val filename = path.toString
      val random = new Random(42)
      withSQLConf(CometConf.COMET_ENABLED.key -> "false") {
        ParquetGenerator.makeParquetFile(
          random,
          spark,
          filename,
          10000,
          DataGenOptions(
            allowNull = true,
            generateNegativeZero = true,
            generateArray = false,
            generateStruct = false,
            generateMap = false))
      }
      val table = spark.read.parquet(filename).coalesce(1)
      table.createOrReplaceTempView("t1")
      checkSparkAnswer("SELECT c1, Avg(c7) FROM t1 GROUP BY c1 ORDER BY c1")
    }
  }

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions