[MRG] Errors for pandas sparse arrays as target#14125
[MRG] Errors for pandas sparse arrays as target#14125glemaitre merged 7 commits intoscikit-learn:masterfrom
Conversation
sklearn/utils/multiclass.py
Outdated
| raise ValueError("y cannot be class 'SparseSeries'.") | ||
| sparse_pandas = (y.__class__.__name__ in ['SparseSeries', 'SparseArray']) | ||
| if sparse_pandas: | ||
| with suppress(ImportError): |
|
Should there be a sparse efficiency warning?
|
|
CI error is unrelated.
When a user calls |
|
No, upon check_array I suppose
|
I am split with the idea. Ideally, we should but it might be noisy knowing that we convert only the target. |
I am okay with an efficient warning in |
|
I can see three ways to handle this:
|
|
Given that pandas using |
|
I thought at some point that the sparse value in pandas was configurable.
|
|
Ah yes, it can be specified: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.SparseArray.html using At the moment this PR only updates a test to use |
|
I haven't followed the whole discussion but this seems good? |
|
Thanks @thomasjpfan |
Reference Issues/PRs
Resolves #14002
Resolves #14005
What does this implement/fix? Explain your changes.
With pandas >= 0.24, we can support pandas sparse arrays. #14005 (comment) BUT given the nature of how pandas uses
np.nanas the zero value, this PR will continue to raise an error for pandas sparse arrays