[MRG+1] Improve the error message for some metrics when the shape of sample_weight is inappropriate#9903
[MRG+1] Improve the error message for some metrics when the shape of sample_weight is inappropriate#9903TomDLT merged 2 commits intoscikit-learn:masterfrom qinhanmin2014:metric_err_message
Conversation
|
That's pretty nice you did it via the common tests, well done! LGTM. A small tip for PEP8, you should spend some time configuring your editor to have on-the-fly flake8 checks, this saves some time in the long run! Another small tip you can link to a given comment in an issue e.g. #9786 (comment) (I edited your message accordingly). This makes it a lot easier to find the comment you are referring to. Note what I had originally in mind was something a bit more generic than just doing it for metrics i.e. checking that each time a function signature had |
|
@lesteve Thanks a lot for the detailed instruction :)
I think at least it won't be difficult if we go through all the public functions. I'll do this in a couple of days and open PR/issue if necessary. |
Indeed ! Thanks @qinhanmin2014 |
…sample_weight is inappropriate (#9903)
|
ping @lesteve I tried my best to go through all the public methods (still hard to guarantee the completeness).
We do not always use check_consistent_length to check the shape of sample_weight. Sometimes, we use an if statement, it enables us to raise even more informative error message than check_sample_weight, e.g., Shapes of X and sample_weight do not match. Overall, I do not find any public function that can run through with inappropriate shape of sample_weight, but there are indeed some functions which do not check the shape of sample_weight, thus resulting in unclear error message. These include DummyClassifier
KernelRidge
LinearRegression, RANSACRegressor, Ridge, RidgeClassifier
MultinomialNB, ComplementNB, BernoulliNBBelow are the modules I go through which seems related to sample_weight. Some modules, e.g., sklearn.ensemble, seems to pass sample_weight directly to the underlying estimators, thus are not included. sklearn.cluster, sklearn.dummy, sklearn.isotonic
sklearn.kernel_ridge, sklearn.linear_mode, sklearn.naive_bayes
sklearn.svm, sklearn.treeWDYT? Does it worth a fix? Thanks :) Update: I have opened #9926 for further discussion. |
…sample_weight is inappropriate (scikit-learn#9903)
…sample_weight is inappropriate (scikit-learn#9903)
Reference Issue
proposed by @lesteve in #9786 (comment).
Fixes #9870
What does this implement/fix? Explain your changes.
Currently, many metrics do not explicitly check the shape of sample_weight. Instead, they rely on certain statement to block the code from running through, this may cause:
(1)Users can't get meaningful error message (e.g., now you may get
Axis must be specified when shapes of a and weights differor evenoperands could not be broadcast together with shapes (2,1) (3,1))(2)Sometimes all the statements fail to block the code and you even can't get an erorr (e.g., roc_auc_score previously)
The PR fixes the problem and improves the common test to ensure that meaningful error message is raised by all metrics with sample_weight.
Any other comments?
cc @jnothman @lesteve