Skip to content

remove failing tests#469

Merged
TomAugspurger merged 1 commit intodask:masterfrom
TomAugspurger:test-fixup
Feb 21, 2019
Merged

remove failing tests#469
TomAugspurger merged 1 commit intodask:masterfrom
TomAugspurger:test-fixup

Conversation

@TomAugspurger
Copy link
Copy Markdown
Member

Theses tests were asserting that you could train on a dataframe and transform an array.

I don't really know how these worked in the past. Pandas and numpy unfortunately have different broadcasting behavior (which pandas is working to fix).

In [16]: x
Out[16]:
array([[ 1.2061934 ,  0.78132208],
       [ 0.14763367, -0.47647038],
       [ 0.37784154,  1.13200053],
       [ 1.58888898, -1.84890062]])

In [17]: s
Out[17]:
0    1.206193
1    0.781322
dtype: float64

In [18]: x * s.values
Out[18]:
array([[ 1.45490252,  0.61046419],
       [ 0.17807476, -0.37227683],
       [ 0.45574998,  0.88445701],
       [ 1.9165074 , -1.44458687]])

In [19]: x * s
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-19-b699db99ec8b> in <module>
----> 1 x * s

~/miniconda3/envs/pandas-0.23.4/lib/python3.6/site-packages/pandas/core/ops.py in wrapper(left, right)
   1069         result = safe_na_op(lvalues, rvalues)
   1070         return construct_result(left, result,
-> 1071                                 index=left.index, name=res_name, dtype=None)
   1072
   1073     return wrapper

~/miniconda3/envs/pandas-0.23.4/lib/python3.6/site-packages/pandas/core/ops.py in _construct_result(left, result, index, name, dtype)
    978     not be enough; we still need to override the name attribute.
    979     """
--> 980     out = left._constructor(result, index=index, dtype=dtype)
    981
    982     out.name = name

~/miniconda3/envs/pandas-0.23.4/lib/python3.6/site-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    260                             'Length of passed values is {val}, '
    261                             'index implies {ind}'
--> 262                             .format(val=len(data), ind=len(index)))
    263                 except TypeError:
    264                     pass

ValueError: Length of passed values is 4, index implies 2

Copy link
Copy Markdown
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After running this test with a few recent commits in the Dask git history, this test starts breaking with commit dask/dask@79cb051. That commit fixed an issue where we were calling da.asarray internally in Dask which converted arrays of other types to NumPy arrays when we should have been calling da.asanyarray which preserves other array subclasses.

I suspect that's why this test was passing originally. Pandas Series were being converted to NumPy arrays and the broadcasting behavior worked.

I'm +1 for the changes in this PR

@TomAugspurger
Copy link
Copy Markdown
Member Author

Thanks for digging to find the actual change @jrbourbeau :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants