[SPARK-35510][PYTHON] Fix and reenable test_stats_on_non_numeric_columns_should_be_discarded_if_numeric_only_is_true#32690
[SPARK-35510][PYTHON] Fix and reenable test_stats_on_non_numeric_columns_should_be_discarded_if_numeric_only_is_true#32690HyukjinKwon wants to merge 2 commits intoapache:masterfrom
Conversation
|
cc @xinrong-databricks and @itholic too fyi |
…d_if_numeric_only_is_true
|
Test build #139046 has finished for PR 32690 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
viirya
left a comment
There was a problem hiding this comment.
Looks good. Maybe create a JIRA and add to the code comment?
|
Test build #139053 has finished for PR 32690 at commit
|
|
Merged to master. |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
What changes were proposed in this pull request?
This PR proposes to fix and reenable
test_stats_on_non_numeric_columns_should_be_discarded_if_numeric_only_is_truethat was disabled when we upgrade Python 3.9 in CI at #32657.Seems like this is because of the latest NumPy's behaviour change, see also
https://github.com/numpy/numpy/pull/16273#discussion_r641264085.pandas inherits this behaviour but it doesn't make sense when
numeric_onlyis set toTruein pandas. I will track and follow the status of the issue between pandas and NumPy.For the time being, I propose to exclude boolean case alone in percentile/quartile test case
Why are the changes needed?
To keep the test coverage.
Does this PR introduce any user-facing change?
No, test-only.
How was this patch tested?
I roughly locally tested. But it should pass in CI.