Skip to content

Improve error message for sparse multilabel-indicator y in RandomForestClassifier#15971

Merged
jnothman merged 10 commits intoscikit-learn:masterfrom
rushabh-v:master
Feb 10, 2020
Merged

Improve error message for sparse multilabel-indicator y in RandomForestClassifier#15971
jnothman merged 10 commits intoscikit-learn:masterfrom
rushabh-v:master

Conversation

@rushabh-v
Copy link
Copy Markdown
Contributor

@rushabh-v rushabh-v commented Dec 25, 2019

Fixes #15958 .

@rushabh-v rushabh-v changed the title edit the error message Improve error message for sparse multilabel-indicator y in RandomForestClassifier Dec 25, 2019
rth
rth previously requested changes Dec 26, 2019
Copy link
Copy Markdown
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.

Can somebody please guide me to get rid of these failing codecov checks?

You need to add a unit tests that passes a sparse y and checks that the correct error message is raised with,

msg = "multilabel-indicator of type sparse is not supported"
with pytest.raises(ValueError, match=msg):
    est.fit(X, y)

@rushabh-v rushabh-v requested a review from rth December 27, 2019 01:28
@rushabh-v
Copy link
Copy Markdown
Contributor Author

rushabh-v commented Dec 27, 2019

I have changed the location of the condition because the call y = np.atleast_1d(y) in the fit function changes the instance of the sparse matrices. I think we should put the condition at the very beginning of the fit function. Are there any other suggestions @rth ?

@rushabh-v
Copy link
Copy Markdown
Contributor Author

rushabh-v commented Dec 28, 2019

Can we change value of the argement accept_sparse in the call y = check_array(y, accept_sparse='csc', ensure_2d=False, dtype=None)?

@jnothman
Copy link
Copy Markdown
Member

Sorry, you're right, this should be improved....

Can we change value of the argement accept_sparse in the call y = check_array(y, accept_sparse='csc', ensure_2d=False, dtype=None)?

That seems like the right thing to do.... Why do we say that we accept sparse y there?

DecisionTreeClassifier checks y and does not accept sparse....

@rushabh-v rushabh-v requested a review from jnothman December 29, 2019 05:21
@thomasjpfan
Copy link
Copy Markdown
Member

DecisionTreeClassifier checks y and does not accept sparse....

Looks like DecisionTreeClassifier use to accept it and was changed in 409c888

The forest was not updated to reflect this change.

@rushabh-v
Copy link
Copy Markdown
Contributor Author

rushabh-v commented Jan 9, 2020

Looks like DecisionTreeClassifier use to accept it and was changed in 409c888
The forest was not updated to reflect this change.

Then is this PR going to be merged or it should be closed now?

@rushabh-v rushabh-v closed this Jan 9, 2020
@rushabh-v rushabh-v reopened this Jan 9, 2020
@thomasjpfan
Copy link
Copy Markdown
Member

This PR is nice to have since it specifies y as the problem.

For the regular tree code, we do not do this explicit check and use the default check_array error message which is ambiguous about which array is causing the error.

@rushabh-v
Copy link
Copy Markdown
Contributor Author

This PR is nice to have since it specifies y as the problem.

For the regular tree code, we do not do this explicit check and use the default check_array error message which is ambiguous about which array is causing the error.

Oh, okay.

@rushabh-v
Copy link
Copy Markdown
Contributor Author

Can you merge this, please? @NicolasHug

@jnothman jnothman merged commit 3f0b6c0 into scikit-learn:master Feb 10, 2020
@jnothman
Copy link
Copy Markdown
Member

Thanks @rushabh-v

thomasjpfan pushed a commit to thomasjpfan/scikit-learn that referenced this pull request Feb 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RandomForestClassifier does not handle sparse multilabel-indicator y

5 participants