Test pandas 1.1.x / 1.2.0 releases and pandas nightly#6996
Test pandas 1.1.x / 1.2.0 releases and pandas nightly#6996jsignell merged 44 commits intodask:masterfrom
Conversation
40b71bd to
b82dc21
Compare
|
Thanks for starting this, I've been a bit busy lately :) |
|
I fixed some of the obvious ones. Mind if I push? |
|
Great, feel free to push! Some notes about the failures I already investigated up to now:
|
|
I was just reading the Future Warrnings and trying to fix up any that I could. |
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
|
The following failure is due to pandas-dev/pandas#28507, comparing tz-naive and tz-aware timestamp no longer raises an error, but returns False: |
|
For the failing |
It seems reasonable to me to just skip that test for this case until someone gets around to implementing it. The dask implementation depends on var, which pandas doesn't yet have for datetime. diff --git a/dask/dataframe/tests/test_arithmetics_reduction.py b/dask/dataframe/tests/test_arithmetics_reduction.py
index c04d3c07..2758de23 100644
--- a/dask/dataframe/tests/test_arithmetics_reduction.py
+++ b/dask/dataframe/tests/test_arithmetics_reduction.py
@@ -6,7 +6,7 @@ import numpy as np
import pandas as pd
import dask.dataframe as dd
-from dask.dataframe._compat import PANDAS_GT_100, PANDAS_VERSION
+from dask.dataframe._compat import PANDAS_GT_100, PANDAS_GT_120, PANDAS_VERSION
from dask.dataframe.utils import (
assert_eq,
assert_dask_graph,
@@ -1002,7 +1002,12 @@ def test_reductions_non_numeric_dtypes():
assert_eq(dds.min(), pds.min())
assert_eq(dds.max(), pds.max())
assert_eq(dds.count(), pds.count())
- check_raises(dds, pds, "std")
+ if PANDAS_GT_120 and pds.dtype == "datetime64[ns]":
+ # std is implemented for datetimes in pandas 1.2.0, but dask
+ # implementation depends on var which isn't
+ pass
+ else:
+ check_raises(dds, pds, "std")
check_raises(dds, pds, "var")
check_raises(dds, pds, "sem")
check_raises(dds, pds, "skew") |
Indeed, that sounds best. We should open an issue for that on the pandas side. The parquet failures are a bit strange: I can reproduce them locally in an environment with pandas 1.2.0rc, but it seems I also get the same failure with the latest stable pandas (but here on CI there is no such failure). So there might be an interaction with another library version in play, don't directly see it. |
I have been seeing the parquet failures locally for a while. I can't tell if it is related to pandas version or not. I don't think dask has been doing any tests against pandas > 1.0.* except for the upstream ones which have been failing for a while #6148 |
|
Ah, indeed, it's already failing on pandas 1.1, but passing on pandas 1.0. So basically writing a partitioned parquet dataset where the partition column is categorical dtype is completely broken.
Ai .. pandas 1.1 is a half year old, so we probably should have been testing that .. |
|
I opened pandas-dev/pandas#38642 for this on the pandas side. |
Maybe we should xfail those tests for now while the conversation goes on |
|
I think we are mostly down to the ufunc tests. It sounds like you have in mind how to fix them? |
|
Ah no, there are failures on https://github.com/dask/dask/pull/6996/checks?check_run_id=1597495686 now as well. Maybe the sparse skip in 90de1b4 should be more nuanced? |
| ).compute() | ||
| out["lon"] = out.lon.astype("int") # just to pass assert | ||
| # convert categorical to plain int just to pass assert | ||
| out["lon"] = out.lon.astype(df0.lon.dtype) |
There was a problem hiding this comment.
There was a int64 vs int32 issue on windows in the 3.8 env with pandas 1.2.0 (https://github.com/dask/dask/pull/6996/checks?check_run_id=1681175945#step:5:249). Not directly sure how that would be caused by pandas, though, or why it started failing now.
There was a problem hiding this comment.
Hmm, still failing with this change ..
| prolling = df.a.rolling(window, center=center) | ||
| drolling = ddf.a.rolling(window, center=center) | ||
| prolling = df.a.rolling(window, center=center, min_periods=min_periods) | ||
| drolling = ddf.a.rolling(window, center=center, min_periods=min_periods) |
There was a problem hiding this comment.
It seems that this test now fails once in a few runs, see eg https://github.com/dask/dask/pull/6996/checks?check_run_id=1681392897#step:5:191
|
Is there anything I can do to help with this? |
|
I've been able to successfully apply this to the Fedora dask package for testing in Rawhide, and it appears to pass tests with Pandas 1.2.0 fine. Other than a rebase/merge, not sure if anything else needs to be done here. |
|
cc @crusaderky |
|
Merged master to resolve conflicts. I think the last remaining item for this PR is #6996 (comment): the rolling tests are failing once in every few runs (a floating point precision issue, setting the tolerance in the assert to a higher value might solve it, but it's still potentially worth investigating why it fails with pandas 1.2 and not with other versions) In addition, there are a few follow-ups required (things for which I only added a workaround or skip in this PR), but will list those in a new issue. |
|
See eg the failure in the last builds: https://github.com/dask/dask/pull/6996/checks?check_run_id=1730329886#step:5:192 (here it failed in the Mac build, but it's not Mac-specific, it failed before on Linux as well) |
There are other places where we have upped the tolerance for specific pandas versions. https://github.com/dask/dask/blob/72304a94c98ace592f01df91e3d9e89febda307c/dask/dataframe/tests/test_rolling.py#L263:L270 we should probably do something similar here. |
|
I'm going to up the tolerance on the rolling precision tests so that we can hopefully get this in before the release on friday. |
|
I think we should change back "run upstream every time" and merge this. |
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
|
Thanks for those last updates! I think it's then indeed good to to be merged. |
| # nightly numpy/pandas again | ||
| conda update -y -c arrow-nightlies pyarrow | ||
|
|
||
| conda uninstall --force pandas |
|
hmm the special commit message for test-upstream doesn't seem to work as expected. That isn't this PR's job though. Will merge when green. |
|
I pushed an empty commit to try to get the tests to pass. |
|
@jorisvandenbossche 👏 👏 merged! |
|
And opened the follow-up issue now listing all outstanding issues to resolve: #7100 |
No description provided.