FEA Adds decision_threshold_curve function#31338
FEA Adds decision_threshold_curve function#31338lucyleeow wants to merge 62 commits intoscikit-learn:mainfrom
decision_threshold_curve function#31338Conversation
…/scikit-learn into metric_threshold_curve
| .. currentmodule:: sklearn.model_selection | ||
|
|
||
| .. _TunedThresholdClassifierCV: | ||
| .. _threshold_tunning: |
There was a problem hiding this comment.
I referenced this page starting here for decision_threshold_curve because I thought this first paragraph was appropriate, for context, not just the section I added. Not 100% on this though and happy to change
| .. _threshold_tunning: | ||
|
|
||
| ================================================== | ||
| Tuning the decision threshold for class prediction |
There was a problem hiding this comment.
@glemaitre Can't comment where this is relevant (after L49), but I wonder if it would be interesting to add another scenario where threshold tunning may be of interest - imbalanced datasets?
| good, or a loss function, meaning low is good. In the latter case, the | ||
| the output of `score_func` will be sign-flipped. | ||
|
|
||
| labels : array-like, default=None |
There was a problem hiding this comment.
_CurveScorer uses the term "classes" but "labels" is consistent with what is used for other classification metrics, so chose this.
| between the minimum and maximum of `y_score`. If an array-like, it will be | ||
| used as the thresholds. | ||
|
|
||
| greater_is_better : bool, default=True |
There was a problem hiding this comment.
Again, this is consistent, with term used in other metrics, so avoided using "sign".
| # This could also be done in `decision_threshold_curve`, not sure which | ||
| # is better | ||
| y_true_unique = cached_unique(y_true) | ||
| if classes is None: | ||
| classes = y_true_unique | ||
| # not sure if this separate error msg needed. | ||
| # there is the possibility that set(classes) != set(y_true_unique) fails | ||
| # because `y_true` only contains one class. | ||
| if len(y_true_unique) == 1: | ||
| raise ValueError("`y_true` only contains one class label.") | ||
| if set(classes) != set(y_true_unique): | ||
| raise ValueError( | ||
| f"`classes` ({classes}) is not equal to the unique values found in " | ||
| f"`y_true` ({y_true_unique})." | ||
| ) |
There was a problem hiding this comment.
These checks could be done in decision_threshold_curve instead, not sure which is better.
| for th in potential_thresholds | ||
| ] | ||
| return np.array(score_thresholds), potential_thresholds | ||
| # why 'potential' ? |
There was a problem hiding this comment.
Just for my education, why use the term "potential", in "potential_thresholds". Is it because there a possibility that a threshold is redundant because the predicted labels are the same for adjacent thresholds?
|
@glemaitre I've highlighted questions in the code. Tests still to be added for |
jeremiedbb
left a comment
There was a problem hiding this comment.
Thanks for the PR @lucyleeow. I haven't looked at the whole PR yet. I'm wondering if there's a way to not have threshold as a parameter.
| thresholds : int or array-like, default=20 | ||
| Specifies number of decision thresholds to compute score for. If an integer, | ||
| it will be used to generate `thresholds` thresholds uniformly distributed | ||
| between the minimum and maximum of `y_score`. If an array-like, it will be | ||
| used as the thresholds. |
There was a problem hiding this comment.
I believe we can compute the thresholds automatically, like in the other curve functions.
score_func is called on y_true (independent of the threshold) and y_pred where y_pred is computed from y_score and the threshold.
If we sort y_score in ascending order, it's only where the value of sorted_y_score changes that we need to set a threshold. That way we compute the exact number of thresholds. We can probably leverage some code from _binary_clf_curve
scikit-learn/sklearn/metrics/_ranking.py
Lines 891 to 906 in 18e89a4
There was a problem hiding this comment.
This is a good point, I have added a threshold=None option that does the above.
Would you care to take a quick look at the overall PR before I add tests, as I am not sure about a few things, thanks 🙏
|
Sorry for the delay, there was something puzzling me with the approach of the problem in this PR and I had a hard time figuring out precisely what it was. In particular, the need for To me the issue is the use of the scikit-learn/sklearn/metrics/_ranking.py Line 830 in ea9e824 The difference is that instead of computing the true positives and false positives, we'd compute any (threshold dependent) metric. Briefly, this is done in 3 steps:
I think that we can extract most of the This is related to #30134 where the goal is to make |
|
Interesting, I've reviewed #30134, hopefully that will be merged soon and I can look at your suggested changes after! |
|
@jeremiedbb I've tried your approach in a separate PR #32732, as I thought it was nice to be able to see the two implementations side by side, and also it would have been complicated history/changes wise to do it all in this PR. I think your suggested approach is better, please share your thoughts! |
|
closing in favour of #32732 |
Reference Issues/PRs
closes #25639 (supercedes)
closes #21391
Thought it would be easier to open a new PR and there were no long discussions on the old PR (#25639).
What does this implement/fix? Explain your changes.
Adds a function that takes a scoring function,
y_true,y_score,thresholdsand outputs score per threshold.The intention is to later add a new display class using this function, allowing us to plot metric score per threshold e.g.
Uses the
_CurveScoreras suggested by @glemaitre . Refactors out a new_scores_from_predictionsstatic method, that takes the predictions. The old_scoretakes estimator, calculatesy_scoreand passes to the new_scores_from_prediction.Notes:
_scores_from_predictions- name is inspired by the display class methods 'from_predictions' , but happy to changefrom_scorerbecause we are not using ascorer(which has the signaturecallable(estimator, X, y)) we are using a 'scoring_functionwith signaturescore_func(y_true, y_pred, **kwargs)`_CurveScorerdirectly instead, and then call_scores_from_predictionsindecision_threshold_curvedecided to make[realised that method is nicer to avoid having too many params in_scores_from_predictionsa static method, but I also could have made it a method, and indecision_threshold_curveinstantiate_CurveScorerdirectly first (not viafrom_scorer). Only went with staticmethod path because I initially didn't registerfrom_scorerinstantiates differently than directly via_CurveScorer. Not 100% on what is best here.decision_threshold_curveand avoids some lines of code (use self.xx directly, instead of passing self.xx to_scores_from_predictions)]Any other comments?
cc @glemaitre
Lots more to do, but should get implementation right first.
To do:
_decision_threshold.pymodule docstring