[SPARK-35506][PYTHON][INFRA] Run tests with Python 3.9 in GitHub Actions#32657
[SPARK-35506][PYTHON][INFRA] Run tests with Python 3.9 in GitHub Actions#32657HyukjinKwon wants to merge 2 commits intoapache:masterfrom
Conversation
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #138904 has finished for PR 32657 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
cc @ueshin, @BryanCutler @viirya FYI |
|
Test build #138921 has finished for PR 32657 at commit
|
| uses: actions/setup-python@v2 | ||
| with: | ||
| python-version: 3.9 | ||
| architecture: x64 |
There was a problem hiding this comment.
QQ: is it necessary to specify the architecture here?
There was a problem hiding this comment.
Seems not .. but let me just leave it for consistency with other places above, and just to be explicit.
| architecture: x64 | ||
| - name: Install Python packages (Python 3.9) | ||
| run: | | ||
| python3.9 -m pip install numpy 'pyarrow<5.0.0' pandas scipy xmlrunner plotly>=4.8 |
There was a problem hiding this comment.
Is this intentional to add a new PyArrow version test coverage on Python 3.9 only?
There was a problem hiding this comment.
Oh yeah. I should've commented here. Python 3.9 support was added from https://issues.apache.org/jira/browse/ARROW-10224, and I just tentatively tried PyArrow 4.0.0 but it worked. So I just set it to the highest working version for now.
There was a problem hiding this comment.
SGTM. The Arrow binary format remains the same so it's good to continue testing with the latest pyarrow.
| # TODO(SPARK-35510): This fails with Python 3.9. We should fix and reenable it. | ||
| # self.assert_eq( | ||
| # len(psdf.quantile(q=0.5, numeric_only=True)), | ||
| # len(pdf.quantile(q=0.5, numeric_only=True)), | ||
| # ) | ||
| # self.assert_eq( | ||
| # len(psdf.quantile(q=[0.25, 0.5, 0.75], numeric_only=True)), | ||
| # len(pdf.quantile(q=[0.25, 0.5, 0.75], numeric_only=True)), | ||
| # ) |
There was a problem hiding this comment.
Does this only fail with Python 3.9 on GitHub Actions? I saw we update tests for Python 3.9 before, seems this was not caught previously.
There was a problem hiding this comment.
Oh, this test was added after we tested with Python 3.9 (as part of pandas-on-Spark).
There was a problem hiding this comment.
and Koalas was not running tests against Python 3.9 due to the missing Python 3.9 support in Arrow. Seems now they support fine :-).
|
Thanks guys! Merged to master. |
…mns_should_be_discarded_if_numeric_only_is_true ### What changes were proposed in this pull request? This PR proposes to fix and reenable `test_stats_on_non_numeric_columns_should_be_discarded_if_numeric_only_is_true` that was disabled when we upgrade Python 3.9 in CI at #32657. Seems like this is because of the latest NumPy's behaviour change, see also `https://github.com/numpy/numpy/pull/16273#discussion_r641264085`. pandas inherits this behaviour but it doesn't make sense when `numeric_only` is set to `True` in pandas. I will track and follow the status of the issue between pandas and NumPy. For the time being, I propose to exclude boolean case alone in percentile/quartile test case ### Why are the changes needed? To keep the test coverage. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? I roughly locally tested. But it should pass in CI. Closes #32690 from HyukjinKwon/SPARK-35510. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
This PR enables GitHub Actions to test PySpark with Python 3.9.
Why are the changes needed?
To verify the support of Python 3.9.
Does this PR introduce any user-facing change?
No, test-only.
How was this patch tested?
Existing tests should cover.