[MRG+1] Raise error when SparseSeries is passed into classification metrics#7373
[MRG+1] Raise error when SparseSeries is passed into classification metrics#7373GaelVaroquaux merged 20 commits intoscikit-learn:masterfrom
Conversation
sklearn/metrics/ranking.py
Outdated
| import warnings | ||
| import numpy as np | ||
| from scipy.sparse import csr_matrix | ||
| import pandas |
There was a problem hiding this comment.
Hi, thanks for the PR. But AFAIR, it is a convention in general to import pandas and also put it in a try catch block ( like here ). I suppose this is necessary since Pandas is not a compulsory requirement for installing scikit-learn and this import would raise an error in case Pandas isn't installed. Other than that, LGTM.
There was a problem hiding this comment.
actually, you can not import pandas AT ALL. What you refer to is a test, and you can't really do that same thing here. You can check the class name, though, if you like.
There was a problem hiding this comment.
@amueller sorry for my ignorance. Will keep it in mind from next time.
There was a problem hiding this comment.
@maniteja123 don't sweat it, any help is appreciated :)
There was a problem hiding this comment.
Okay. I will be removing this line. Thanks.
|
This shouldn't just be in |
|
@jnothman, just for clarity, since Thanks |
…arget' function. Finally, add 'type_of_target' call to _binary_clf_curve
Update from Original
Update from Original
|
I've removed the |
|
you should add a test. I'm not sure if we have a mock for a sparse series but there are several pandas mocks that you could use. |
|
Just to make sure I understand correctly... I should add a new test to this code in sklean/utils/tests/test_multiclass.py, correct? |
|
yeah that sounds like a reasonable place |
sklearn/metrics/ranking.py
Outdated
| """ | ||
| # Check to make sure y_true is valid | ||
| y_type = type_of_target(y_true) | ||
| if y_type != "binary": |
There was a problem hiding this comment.
Also, in some of the other testing I'm getting the error: ValueError: multiclass format is not supported. I was under the impression that the _binary_clf_curve required 'binary' data. Should it also be allowed to accept 'multiclass' data?
| from pandas import SparseSeries | ||
| except ImportError: | ||
| pass | ||
| y = SparseSeries([1, 0, 0, 1, 0]) |
There was a problem hiding this comment.
So I'm seeing the error in the automatic checks that states: UnboundLocalError: local variable 'SparseSeries' referenced before assignment Do I need to put all of the test code within the try block?
…l_curve_pos_label since as multiclass it doesn't make sense
…inary_clf_curve to test new logic in _binary_clf_curve function
|
Okay. I think it should be good now. I understand the supposed issue with |
| msg = "y cannot be class 'SparseSeries'." | ||
| assert_raises_regex(ValueError, msg, type_of_target, y) | ||
| except ImportError: | ||
| pass |
There was a problem hiding this comment.
Please wrap only the import statement and use raise SkipTest("Pandas not found") as elsewhere
| y = SparseSeries([1, 0, 0, 1, 0]) | ||
| msg = "y cannot be class 'SparseSeries'." | ||
| assert_raises_regex(ValueError, msg, type_of_target, y) | ||
|
|
There was a problem hiding this comment.
Per @jnothman 's request, I've only wrapped the import and added raise SkipTest("Pandas not found") otherwise.
| y_type = type_of_target(y_true) | ||
| if not (y_type == "binary" or | ||
| (y_type == "multiclass" and pos_label is not None)): | ||
| raise ValueError("{0} format is not supported".format(y_type)) |
There was a problem hiding this comment.
It's a nitpick, but it would help the user to give a different error message if y_type == "multiclass" and pos_label is None.
Beside, I am surprised, but it is really the case that in multiclass settings we require the pos_label not to be specified? I would have though the opposite. Is there an error in the condition above, or in my assumptions on our code?
There was a problem hiding this comment.
As @jnothman pointed out: the code is correct, I was confused by the double negation.
Still, a different error message would help.
There was a problem hiding this comment.
Still, a different error message would help.
Agreed. The error message template I generally try to follow is something like:
Allowed values for parameter_name are ['value1', 'value2', 'value3']. Instead you provided 'parameter_name={parameter_value}'
There was a problem hiding this comment.
@lesteve, my choice of error message was copy pasted from other portions of this code. I chose the language to be consistent with the other instances of similar errors in ranking.py.
|
Hmmm I am a bit confused on this one. I commented on the issue, see #7352 (comment). |
|
Gael i think the confusion is yours in reading a double negation
…On 1 Jun 2017 9:59 pm, "Loïc Estève" ***@***.***> wrote:
Hmmm I am a bit confused on this one. I commented on the issue, see #7352
(comment)
<#7352 (comment)>
.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7373 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz62XaeqMndbeD52eOJWf6AAP9osh8ks5r_qejgaJpZM4J4jvQ>
.
|
| raise ValueError('Expected array-like (array or non-string sequence), ' | ||
| 'got %r' % y) | ||
|
|
||
| sparseseries = (y.__class__.__name__ == 'SparseSeries') |
There was a problem hiding this comment.
Just testing the name of the class is a bit dodgy, I think it would be better to use an isinstance.
There was a problem hiding this comment.
So, I've gone back to using the name of the class, per prior comments from @amueller on commit d21c7e38674388f97e146aef67f42bef2fe5d2d2, pandas should not be imported at all except in test.
|
You could be more specific, if that's a concern, by checking the type's
module name starts with 'pandas.'. you could also avoid __class__ with
type() etc. You could check the object's mro to see if it is a subclass of
SparseSeries.
, as I've said, I think this is fine
…On 2 Jun 2017 7:31 am, "nielsenmarkus11" ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In sklearn/utils/multiclass.py
<#7373 (comment)>
:
> @@ -234,6 +234,10 @@ def type_of_target(y):
raise ValueError('Expected array-like (array or non-string sequence), '
'got %r' % y)
+ sparseseries = (y.__class__.__name__ == 'SparseSeries')
So, I've gone back to using the name of the class, per prior comments from
@amueller <https://github.com/amueller> on commit
d21c7e3
<d21c7e3>.
Per his comments pandas should not be imported at all except in test.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7373 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz67_7fTexHGOhdpageYjs3sFNq_shks5r_y2ZgaJpZM4J4jvQ>
.
|
|
Another review here? |
|
Ping @amueller |
|
LGTM. Merging given @jnothman 's +1 |
|
Thanks! |
|
Thank you! |
…etrics (scikit-learn#7373) * Raise error when SparseSeries is passed into roc_curve * Changed "y_true" in second if block to "y_score" * Remove code to import pandas and add sparseseries check to 'type_of_target' function. Finally, add 'type_of_target' call to _binary_clf_curve * Remove pandas import and old comparison in roc_curve. * Add test for 'type_of_target' function * Add white space after commas * Correct other white space issues * Move type_of_target test into try clause, remove test_precision_recall_curve_pos_label since as multiclass it doesn't make sense * Add test_precision_recall_curve_pos_label back in and also add test_binary_clf_curve to test new logic in _binary_clf_curve function * Correct syntax and formatting. * Remove trailing white space * Correct validation logic * Update test_multiclass.py per @jnothman 's request. * Import SkipTest function. * Remove extra white space from line 303
* tag '0.19.1': (117 commits) TST Improve SelectFromModel tests (scikit-learn#9733) Name in what's new [MRG+1] Raise error when SparseSeries is passed into classification metrics (scikit-learn#7373) Fix LogisticRegressionCV default solver value in docstring (scikit-learn#9962) [MRG+1] DOC fix sign in GBRT mathematical formulation (scikit-learn#9885) [MRG+1] DOC fix sign in GBRT mathematical formulation (scikit-learn#9885) DOC fix a typo (scikit-learn#9892) [MRG+1] Ledoit-Wolf behavior explanation (scikit-learn#9500) [MRG+1] Fix typos in documentation (scikit-learn#9878) DOC: Use setattr(self, ...) instead of self.setattr(...) (scikit-learn#9866) DOC Removed a duplicate occurrence of a word in 'sklearn.neighbors.KNeighborsRegressor' docs (scikit-learn#9862) FIX docstring of negative_outlier_factor_ in LOF (scikit-learn#9809) [MRG+1] Fix scikit-learn#9743: Adding parameter information to docstring. (scikit-learn#9757) DOC: fix docstring of Imputer.fit (scikit-learn#9769) various minor spelling tweaks (scikit-learn#9783) MAINT comment on apparent inconsistency [MRG+1] DOC fix headers level in cross_validation.rst (scikit-learn#9679) Fix mailmap format (scikit-learn#9620) DOC Fix typos (scikit-learn#9577) Typo (scikit-learn#9571) ...
* releases: (117 commits) TST Improve SelectFromModel tests (scikit-learn#9733) Name in what's new [MRG+1] Raise error when SparseSeries is passed into classification metrics (scikit-learn#7373) Fix LogisticRegressionCV default solver value in docstring (scikit-learn#9962) [MRG+1] DOC fix sign in GBRT mathematical formulation (scikit-learn#9885) [MRG+1] DOC fix sign in GBRT mathematical formulation (scikit-learn#9885) DOC fix a typo (scikit-learn#9892) [MRG+1] Ledoit-Wolf behavior explanation (scikit-learn#9500) [MRG+1] Fix typos in documentation (scikit-learn#9878) DOC: Use setattr(self, ...) instead of self.setattr(...) (scikit-learn#9866) DOC Removed a duplicate occurrence of a word in 'sklearn.neighbors.KNeighborsRegressor' docs (scikit-learn#9862) FIX docstring of negative_outlier_factor_ in LOF (scikit-learn#9809) [MRG+1] Fix scikit-learn#9743: Adding parameter information to docstring. (scikit-learn#9757) DOC: fix docstring of Imputer.fit (scikit-learn#9769) various minor spelling tweaks (scikit-learn#9783) MAINT comment on apparent inconsistency [MRG+1] DOC fix headers level in cross_validation.rst (scikit-learn#9679) Fix mailmap format (scikit-learn#9620) DOC Fix typos (scikit-learn#9577) Typo (scikit-learn#9571) ...
* dfsg: (117 commits) TST Improve SelectFromModel tests (scikit-learn#9733) Name in what's new [MRG+1] Raise error when SparseSeries is passed into classification metrics (scikit-learn#7373) Fix LogisticRegressionCV default solver value in docstring (scikit-learn#9962) [MRG+1] DOC fix sign in GBRT mathematical formulation (scikit-learn#9885) [MRG+1] DOC fix sign in GBRT mathematical formulation (scikit-learn#9885) DOC fix a typo (scikit-learn#9892) [MRG+1] Ledoit-Wolf behavior explanation (scikit-learn#9500) [MRG+1] Fix typos in documentation (scikit-learn#9878) DOC: Use setattr(self, ...) instead of self.setattr(...) (scikit-learn#9866) DOC Removed a duplicate occurrence of a word in 'sklearn.neighbors.KNeighborsRegressor' docs (scikit-learn#9862) FIX docstring of negative_outlier_factor_ in LOF (scikit-learn#9809) [MRG+1] Fix scikit-learn#9743: Adding parameter information to docstring. (scikit-learn#9757) DOC: fix docstring of Imputer.fit (scikit-learn#9769) various minor spelling tweaks (scikit-learn#9783) MAINT comment on apparent inconsistency [MRG+1] DOC fix headers level in cross_validation.rst (scikit-learn#9679) Fix mailmap format (scikit-learn#9620) DOC Fix typos (scikit-learn#9577) Typo (scikit-learn#9571) ...
…etrics (scikit-learn#7373) * Raise error when SparseSeries is passed into roc_curve * Changed "y_true" in second if block to "y_score" * Remove code to import pandas and add sparseseries check to 'type_of_target' function. Finally, add 'type_of_target' call to _binary_clf_curve * Remove pandas import and old comparison in roc_curve. * Add test for 'type_of_target' function * Add white space after commas * Correct other white space issues * Move type_of_target test into try clause, remove test_precision_recall_curve_pos_label since as multiclass it doesn't make sense * Add test_precision_recall_curve_pos_label back in and also add test_binary_clf_curve to test new logic in _binary_clf_curve function * Correct syntax and formatting. * Remove trailing white space * Correct validation logic * Update test_multiclass.py per @jnothman 's request. * Import SkipTest function. * Remove extra white space from line 303
…etrics (scikit-learn#7373) * Raise error when SparseSeries is passed into roc_curve * Changed "y_true" in second if block to "y_score" * Remove code to import pandas and add sparseseries check to 'type_of_target' function. Finally, add 'type_of_target' call to _binary_clf_curve * Remove pandas import and old comparison in roc_curve. * Add test for 'type_of_target' function * Add white space after commas * Correct other white space issues * Move type_of_target test into try clause, remove test_precision_recall_curve_pos_label since as multiclass it doesn't make sense * Add test_precision_recall_curve_pos_label back in and also add test_binary_clf_curve to test new logic in _binary_clf_curve function * Correct syntax and formatting. * Remove trailing white space * Correct validation logic * Update test_multiclass.py per @jnothman 's request. * Import SkipTest function. * Remove extra white space from line 303
Reference Issue
Fixes #7352
What does this implement/fix? Explain your changes.
This change raises an error when the type is a pandas SparseSeries of either the
y_trueory_scoreinput.Any other comments?