Skip to content

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Feb 6, 2025

Which issue does this PR close?

Closes #1354

Follow on issue: #1371

Rationale for this change

Fix a correctness issue

What changes are included in this PR?

  • Add test to demonstrate the bug
  • Mark cast from float/double to decimal as incompatible

How are these changes tested?

New test + existing tests

@andygrove
Copy link
Member Author

I don't understand the following test failure:

2025-02-06T17:32:38.2264489Z - final decimal avg *** FAILED *** (17 milliseconds)
2025-02-06T17:32:38.2265038Z   org.apache.spark.sql.AnalysisException: Inserting into an RDD-based table is not allowed.;
2025-02-06T17:32:38.2265561Z 'InsertIntoStatement Repartition 1, false, false, false
2025-02-06T17:32:38.2265884Z +- LocalRelation [col1#340151, col2#340152]

Comment on lines -873 to +892
val table = "t1"
val table = s"final_decimal_avg_$dictionaryEnabled"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes ended up not being entirely necessary, but the test did have a mix of hard-coded t1 and use of the variable $tableName references in SQL, and I made these consistent.

@codecov-commenter
Copy link

codecov-commenter commented Feb 7, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 39.17%. Comparing base (f09f8af) to head (880328b).
Report is 20 commits behind head on main.

Additional details and impacted files
@@              Coverage Diff              @@
##               main    #1372       +/-   ##
=============================================
- Coverage     56.12%   39.17%   -16.96%     
- Complexity      976     2065     +1089     
=============================================
  Files           119      262      +143     
  Lines         11743    60327    +48584     
  Branches       2251    12836    +10585     
=============================================
+ Hits           6591    23631    +17040     
- Misses         4012    32223    +28211     
- Partials       1140     4473     +3333     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@andygrove andygrove marked this pull request as ready for review February 7, 2025 02:35
withTable(tableName) {
val table = spark.read.parquet(filename).coalesce(1)
table.createOrReplaceTempView(tableName)
checkSparkAnswer(s"SELECT c1, avg(c7) FROM $tableName GROUP BY c1 ORDER BY c1")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding // https://github.com/apache/datafusion-comet/issues/1371 and mention to use checkSparkAnswerAndNumOfAggregates once resolved?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@andygrove andygrove merged commit 26b8d57 into apache:main Feb 7, 2025
74 checks passed
@andygrove andygrove deleted the avg-decimal-fix branch February 7, 2025 20:07
coderfender pushed a commit to coderfender/datafusion-comet that referenced this pull request Dec 13, 2025
)

* add failing test

* Mark cast from float/double to decimal as incompat

* update docs

* update cast tests

* link to issue

* fix regressions

* use unique table name in test

* use withTable

* address feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Comet can produce different results to Spark when averaging a decimal

3 participants