Implemented "precision at recall k" and "recall at precision k"#20877
Implemented "precision at recall k" and "recall at precision k"#20877shubhraneel wants to merge 5 commits intoscikit-learn:mainfrom
Conversation
| "v_measure_score", | ||
| "zero_one_loss", | ||
| "brier_score_loss", | ||
| "precision_at_recall_k", |
There was a problem hiding this comment.
I think that it would be better to call it max_precision_at_recall_k and max_recall_at_precision_k to make it obvious that this is the maximum that is taken.
There was a problem hiding this comment.
When I hear "precision_at_recall_k" I think of a single number singled out from the precision recall curve. (given a line, if I constrain X=x_i, then Y=y_i). If I agree with that logic then I think the original name is better to use.
|
|
||
|
|
||
| def recall_at_precision_k(y_true, y_prob, k, *, pos_label=None, sample_weight=None): | ||
| """Computes maximum recall for the thresholds when precision is greater |
There was a problem hiding this comment.
We should make it fit on a single line:
Maximum recall for a precision greater than `k`.
glemaitre
left a comment
There was a problem hiding this comment.
I made a partial review. I will give further comments a bit later.
| true positives and ``fn`` the number of false negatives. The recall is | ||
| intuitively the ability of the classifier to find all the positive samples. | ||
|
|
||
| Read more in the :ref:`User Guide <precision_recall_f_measure_metrics>`. |
There was a problem hiding this comment.
We will need to add a section in the user guide documentation.
I think that we can add an example to show the meaning of the two metrics graphically on a precision-recall curve. We can then reuse the image of the example in the user guide.
| The precision is the ratio ``tp / (tp + fp)`` where ``tp`` is the number of | ||
| true positives and ``fp`` the number of false positives. The precision is | ||
| intuitively the ability of the classifier not to label as positive a sample | ||
| that is negative. | ||
|
|
||
| The recall is the ratio ``tp / (tp + fn)`` where ``tp`` is the number of | ||
| true positives and ``fn`` the number of false negatives. The recall is | ||
| intuitively the ability of the classifier to find all the positive samples. |
There was a problem hiding this comment.
I am thinking that we could avoid to repeat this description
| When ``pos_label=None``, if y_true is in {-1, 1} or {0, 1}, | ||
| ``pos_label`` is set to 1, otherwise an error will be raised. |
There was a problem hiding this comment.
| When ``pos_label=None``, if y_true is in {-1, 1} or {0, 1}, | |
| ``pos_label`` is set to 1, otherwise an error will be raised. | |
| When `pos_label=None`, if y_true is in {-1, 1} or {0, 1}, | |
| `pos_label` is set to 1, otherwise an error will be raised. |
| Returns | ||
| ------- | ||
| recall_at_precision_k : float | ||
| Maximum recall when for the thresholds when precision is greater |
There was a problem hiding this comment.
There is something wrong with this sentence due to the twice "when"
| See Also | ||
| -------- | ||
| precision_recall_curve : Compute precision-recall curve. | ||
| plot_precision_recall_curve : Plot Precision Recall Curve for binary |
There was a problem hiding this comment.
We should not link to the plot_precision_recall_curve because it will be deprecated soon.
| precision_recall_curve : Compute precision-recall curve. | ||
| plot_precision_recall_curve : Plot Precision Recall Curve for binary | ||
| classifiers. | ||
| PrecisionRecallDisplay : Precision Recall visualization. |
There was a problem hiding this comment.
In addition, we should add both the .from_estimator and .from_predictions method
| >>> k = 0.75 | ||
| >>> recall_at_precision_k(y_true, y_prob, k) | ||
| 1.0 | ||
|
|
There was a problem hiding this comment.
You should remove this blank line.
| >>> y_true = np.array([0, 0, 1, 1, 1, 1]) | ||
| >>> y_prob = np.array([0.1, 0.8, 0.9, 0.3, 1.0, 0.95]) | ||
| >>> k = 0.75 | ||
| >>> recall_at_precision_k(y_true, y_prob, k) |
There was a problem hiding this comment.
It might be better to take a threshold for which the score is not 1.0
glemaitre
left a comment
There was a problem hiding this comment.
Just adding the flag "Request changes" to see that this PR has been reviewed
Reference Issues/PRs
Fixes #20266
What does this implement/fix? Explain your changes.
This PR implements the functions of "precision at recall k" and "recall at precision k" in
sklearn.metrics. As mentioned in the issue #20266 by Ryanglambert, these metrics are commonly used, for example, in facebook/mmf