Skip to content

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Jan 18, 2025

Which issue does this PR close?

Closes #1307

Rationale for this change

This PR fixes a specific bug and provides an approach for solving a general problem where we are not checking for supported types when translating expressions to native.

What changes are included in this PR?

  • Move data generator from fuzz testing tool into Comet jar so that it can also be used in integration tests
  • Add type checks for array_remove and related tests

How are these changes tested?

@andygrove andygrove marked this pull request as draft January 19, 2025 14:57
@andygrove andygrove changed the title fix: Fallback to Spark for unsupported types for ArrayRemove fix: Improve testing for array_remove and fallback to Spark for unsupported types Jan 19, 2025
@andygrove andygrove marked this pull request as ready for review January 19, 2025 16:05
Range(0, num).map(_ => generateRow(schema, stringGen))
}

private val primitiveTypes = Seq(
Copy link
Member Author

@andygrove andygrove Jan 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is copied moved from the fuzz testing tool

@codecov-commenter
Copy link

codecov-commenter commented Jan 20, 2025

Codecov Report

Attention: Patch coverage is 85.86957% with 13 lines in your changes missing coverage. Please review.

Project coverage is 34.91%. Comparing base (be48839) to head (f8f244c).
Report is 17 commits behind head on main.

Files with missing lines Patch % Lines
...la/org/apache/comet/testing/ParquetGenerator.scala 89.39% 1 Missing and 6 partials ⚠️
...src/main/scala/org/apache/comet/serde/arrays.scala 66.66% 1 Missing and 5 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1308      +/-   ##
============================================
+ Coverage     34.69%   34.91%   +0.22%     
+ Complexity      991      989       -2     
============================================
  Files           116      119       +3     
  Lines         44891    45204     +313     
  Branches       9864     9959      +95     
============================================
+ Hits          15574    15785     +211     
- Misses        26168    26253      +85     
- Partials       3149     3166      +17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

}
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title says array_remove but the test is for array-intersect?

Or other places that we already have tests?

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
r,
spark,
s"test$i.parquet",
numRows = conf.generateData.numRows(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numRows is for all rows from all generated files or for one single file? Do we have definition for it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the number of rows per file. I pushed a commit to add descriptions for all command-line options.

  -e, --exclude-negative-zero   Whether to exclude negative zero
  -g, --generate-arrays         Whether to generate arrays
      --generate-maps           Whether to generate maps
      --generate-structs        Whether to generate structs
  -n, --num-files  <arg>        Number of files to generate
      --num-rows  <arg>         Number of rows per file
  -h, --help                    Show help message

@andygrove
Copy link
Member Author

Thanks for the reviews @viirya and @kazuyukitanimura

@andygrove andygrove merged commit 44170d7 into apache:main Jan 22, 2025
74 checks passed
@andygrove andygrove deleted the array-remove-type-checks branch January 22, 2025 17:18
coderfender pushed a commit to coderfender/datafusion-comet that referenced this pull request Dec 13, 2025
…ported types (apache#1308)

* Fallback to Spark for unsupported types for ArrayRemove

* save progress

* improve tests

* revert debug logging

* prepare for review

* fix

* format

* remove test failure

* fix

* more testing

* refactor

* update readme

* fix

* format

* fix

* fix

* fix

* fix inverted options

* Update QueryRunner.scala

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>

* Add descriptions to CLI options

---------

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

array_remove with structs causes error at runtime

4 participants