-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Labels
Description
In pandas 1.1, the default behavior of handling datetimes has been deprecated. Previously they were treated like categoricals (gave things like unique). In the future they'll be treated like numerics (will give things like quantiles).
In [15]: df = pd.DataFrame({"A": pd.date_range("2000", periods=2)})
In [16]: ddf = dd.from_pandas(df, npartitions=1)
In [17]: df.describe()
/Users/taugspurger/.virtualenvs/dask-dev/bin/ipython:1: FutureWarning: Treating datetime data as categorical rather than numeric in `.describe` is deprecated and will be removed in a future version of pandas. Specify `datetime_is_numeric=True` to silence this warning and adopt the future behavior now.
#!/Users/taugspurger/Envs/dask-dev/bin/python
Out[17]:
A
count 2
unique 2
top 2000-01-01 00:00:00
freq 1
first 2000-01-01 00:00:00
last 2000-01-02 00:00:00
In [18]: ddf.describe()
/Users/taugspurger/sandbox/dask/dask/dataframe/core.py:2230: FutureWarning: Treating datetime data as categorical rather than numeric in `.describe` is deprecated and will be removed in a future version of pandas. Specify `datetime_is_numeric=True` to silence this warning and adopt the future behavior now.
meta = data._meta_nonempty.describe()
/Users/taugspurger/sandbox/dask/dask/dataframe/core.py:2128: FutureWarning: Treating datetime data as categorical rather than numeric in `.describe` is deprecated and will be removed in a future version of pandas. Specify `datetime_is_numeric=True` to silence this warning and adopt the future behavior now.
meta = self._meta_nonempty.describe(include=include, exclude=exclude)
Out[18]:
Dask DataFrame Structure:
A
npartitions=1
object
...
Dask Name: describe, 19 tasks
In [19]: _.compute()
Out[19]:
A
unique 2
count 2
top 2000-01-02 00:00:00
freq 1
first 2000-01-01 00:00:00
last 2000-01-02 00:00:00To silence this warning, we'll need to use datetime_is_numeric
In [21]: df.describe(datetime_is_numeric=True)
Out[21]:
A
count 2
mean 2000-01-01 12:00:00
min 2000-01-01 00:00:00
25% 2000-01-01 06:00:00
50% 2000-01-01 12:00:00
75% 2000-01-01 18:00:00
max 2000-01-02 00:00:00I don't know if we'll want to add that to our API or not.
Reactions are currently unavailable