Fix r2_score behavior with a single sample (#11435)#12855
Fix r2_score behavior with a single sample (#11435)#12855jnothman merged 15 commits intoscikit-learn:masterfrom
Conversation
sklearn/metrics/regression.py
Outdated
| check_consistent_length(y_true, y_pred, sample_weight) | ||
|
|
||
| if _num_samples(y_pred) < 2: | ||
| msg = "Found an array of {0} sample(s), while the minimum number is 2." |
There was a problem hiding this comment.
Or maybe "R^2 score is not well-defined with less than two samples!", not sure though.
|
Please also add a test and check if the correct warning is raised and the expected nan value is returned. |
| warning_msg = "not well-defined with less than two samples." | ||
|
|
||
| # Check all metrics which are degenerate when passed a single sample. | ||
| for metric in [r2_score]: |
There was a problem hiding this comment.
you can give this to the function as a parameter using pytest parameterize.
|
|
||
| # Check all metrics which are degenerate when passed a single sample. | ||
| for metric in [r2_score]: | ||
| with warnings.catch_warnings(record=True) as w: |
|
|
||
| from sklearn.metrics.regression import _check_reg_targets | ||
|
|
||
| from ...exceptions import DataDimensionalityWarning |
There was a problem hiding this comment.
Wouldn't UndefinedMetricWarning be more apt? Or even a simple UserWarning. What do you think @jnothman ?
There was a problem hiding this comment.
You're right, UndefinedMetricWarning makes more sense with that warning message and is more consistent with other warnings. Let's see what @jnothman has to say.
|
Please add an entry to the change log at |
|
Thanks, @psendyk! |
… (scikit-learn#12855)" This reverts commit 3796e04.
… (scikit-learn#12855)" This reverts commit 3796e04.
Reference Issues/PRs
#11435 partially (see other comments).
What does this implement/fix? Explain your changes.
It fixes the behavior of
r2_scorewhen passed single samples. As explained in #11435, it raises a warning and returns a NaN value.Any other comments?
There are more metrics which are degenerate with a single sample and need that fix.