[MRG] Errors for pandas sparse arrays as target by thomasjpfan · Pull Request #14125 · scikit-learn/scikit-learn

thomasjpfan · 2019-06-19T15:48:38Z

Reference Issues/PRs

Resolves #14002
Resolves #14005

What does this implement/fix? Explain your changes.

With pandas >= 0.24, we can support pandas sparse arrays. #14005 (comment) BUT given the nature of how pandas uses np.nan as the zero value, this PR will continue to raise an error for pandas sparse arrays

a = pd.SparseArray([1, np.nan, 2, 1, np.nan])

np.array(a)                    
# array([ 1., nan,  2.,  1., nan])

np.array(pd.SparseSeries(a))                      
# array([ 1., nan,  2.,  1., nan])

np.array(pd.Series(a))         
# array([ 1., nan,  2.,  1., nan])

rth

Looks good, thanks!

rth · 2019-06-19T16:06:08Z

sklearn/utils/multiclass.py

-        raise ValueError("y cannot be class 'SparseSeries'.")
+    sparse_pandas = (y.__class__.__name__ in ['SparseSeries', 'SparseArray'])
+    if sparse_pandas:
+        with suppress(ImportError):


jnothman · 2019-06-19T21:09:38Z

Should there be a sparse efficiency warning?

thomasjpfan · 2019-06-19T21:14:58Z

CI error is unrelated.

Should there be a sparse efficiency warning?

When a user calls type_of_target?

jnothman · 2019-06-19T21:57:45Z

No, upon check_array I suppose

glemaitre · 2019-06-20T13:01:12Z

Should there be a sparse efficiency warning?

I am split with the idea. Ideally, we should but it might be noisy knowing that we convert only the target.

thomasjpfan · 2019-06-25T14:35:17Z

I am split with the idea. Ideally, we should but it might be noisy knowing that we convert only the target.

I am okay with an efficient warning in check_array. Because pandas sparse series was not supported, thus adding a warning would not introduce any new warnings.

thomasjpfan · 2019-06-25T14:42:43Z

I can see three ways to handle this:

Convert the pandas sparse array into a scipy sparse object. (This may not be worth it)
Warn in check_array when a pandas sparse array is passed.
Continue raising an exception for any pandas sparse array.

…arse

thomasjpfan · 2019-07-04T16:05:37Z

Given that pandas using np.nan as the zero value for their sparse array/series this may lead to confusion. This PR was updated to always raise an error when using pandas sparse arrays as a target.

jnothman · 2019-07-04T23:50:56Z

I thought at some point that the sparse value in pandas was configurable.

thomasjpfan · 2019-07-05T00:47:31Z

Ah yes, it can be specified: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.SparseArray.html using fill_value

At the moment this PR only updates a test to use SparseArray, because SparseSeries is deprecated, and checks for both in type_of_target.

…arse

amueller · 2019-07-17T17:30:54Z

I haven't followed the whole discussion but this seems good?

glemaitre · 2019-07-19T07:58:18Z

Thanks @thomasjpfan

thomasjpfan added 2 commits June 19, 2019 11:42

BUG Updates support for pandas sparse arrays

cd176ed

CLN Uses nans

8b80365

rth approved these changes Jun 19, 2019

View reviewed changes

thomasjpfan added 3 commits June 25, 2019 14:29

Merge remote-tracking branch 'upstream/master' into type_of_target_sp…

62495e1

…arse

Merge remote-tracking branch 'upstream/master' into type_of_target_sp…

88e0ad5

…arse

REV Removes support for pandas sparse series

7ffcc0d

thomasjpfan changed the title ~~[MRG] Only errors for pandas sparse arrays when pandas version < 0.24~~ [MRG] Errors for pandas sparse arrays as target Jul 4, 2019

thomasjpfan added 2 commits July 16, 2019 13:06

Merge remote-tracking branch 'upstream/master' into type_of_target_sp…

391ef9d

…arse

TST Removes monkeypatch

9eee19b

amueller approved these changes Jul 17, 2019

View reviewed changes

glemaitre merged commit cc64397 into scikit-learn:master Jul 19, 2019

jnothman mentioned this pull request Jul 28, 2019

[MRG] Release 0.20.4 #14443

Merged

11 tasks

Uh oh!

Conversation

thomasjpfan commented Jun 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

rth left a comment

Choose a reason for hiding this comment

Uh oh!

rth Jun 19, 2019

Choose a reason for hiding this comment

Uh oh!

jnothman commented Jun 19, 2019 via email

Uh oh!

thomasjpfan commented Jun 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Jun 19, 2019 via email

Uh oh!

glemaitre commented Jun 20, 2019

Uh oh!

thomasjpfan commented Jun 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thomasjpfan commented Jun 25, 2019

Uh oh!

thomasjpfan commented Jul 4, 2019

Uh oh!

jnothman commented Jul 4, 2019 via email

Uh oh!

thomasjpfan commented Jul 5, 2019

Uh oh!

amueller commented Jul 17, 2019

Uh oh!

glemaitre commented Jul 19, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

thomasjpfan commented Jun 19, 2019 •

edited

Loading

thomasjpfan commented Jun 19, 2019 •

edited

Loading

thomasjpfan commented Jun 25, 2019 •

edited

Loading