Skip to content

Error using df.index in and condition #3227

@postelrich

Description

@postelrich

When trying to filter on the index and another condition I get a ValueError: 'dtype' inference failed in 'elemwise' error. The identical condition works in pandas. Issue seems related to the fact that df.index returns a dask array. On dask 0.17.1.

Example

import pandas as pd
import dask.dataframe as dd

df = pd.DataFrame({'time': pd.date_range('2018-01-01', periods=10),
                   'mycat': list('ABCABCABCA')})
df = df.set_index('time')
df['mycat'] = df.mycat.astype('category')
(df.index > '2018-01-01') & (df.mycat == 'A')

ddf = dd.from_pandas(df, npartitions=2)
(ddf.index > '2018-01-01') & (ddf.mycat == 'A')

Stacktrace

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-11ce8cc93904> in <module>()
      8 (df.index > '2018-01-01') & (df.mycat == 'A')
      9 ddf = dd.from_pandas(df, npartitions=2)
---> 10 (ddf.index > '2018-01-01') & (ddf.mycat == 'A')
     11 # (ddf.mycat == 'A') & (ddf.index > '2018-01-01')

~/anaconda3/envs/cme/lib/python3.6/site-packages/dask/array/core.py in __and__(self, other)
   1455 
   1456     def __and__(self, other):
-> 1457         return elemwise(operator.and_, self, other)
   1458 
   1459     def __rand__(self, other):

~/anaconda3/envs/cme/lib/python3.6/site-packages/dask/array/core.py in elemwise(op, *args, **kwargs)
   2935                 if not is_scalar_for_elemwise(a) else a
   2936                 for a in args]
-> 2937         dt = apply_infer_dtype(op, vals, {}, 'elemwise', suggest_dtype=False)
   2938         need_enforce_dtype = any(not is_scalar_for_elemwise(a) and a.ndim == 0 for a in args)
   2939 

~/anaconda3/envs/cme/lib/python3.6/site-packages/dask/array/core.py in apply_infer_dtype(func, args, kwargs, funcname, suggest_dtype)
    544         msg = None
    545     if msg is not None:
--> 546         raise ValueError(msg)
    547     return o.dtype
    548 

ValueError: `dtype` inference failed in `elemwise`.

Original error is below:
------------------------
AttributeError("'numpy.ndarray' object has no attribute 'index'",)

Traceback:
---------
  File "/Users/rpostelnik/anaconda3/envs/cme/lib/python3.6/site-packages/dask/array/core.py", line 529, in apply_infer_dtype
    o = func(*args, **kwargs)
  File "/Users/rpostelnik/anaconda3/envs/cme/lib/python3.6/site-packages/dask/dataframe/core.py", line 1706, in __array_wrap__
    index = context[1][0].index

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions