Implemented "precision at recall k" and "recall at precision k"#20877

Closed

shubhraneel wants to merge 5 commits intoscikit-learn:mainfrom

shubhraneel:precisionatrecallk

Contributor

shubhraneel commented Aug 28, 2021

Reference Issues/PRs

Fixes #20266

What does this implement/fix? Explain your changes.

This PR implements the functions of "precision at recall k" and "recall at precision k" in sklearn.metrics. As mentioned in the issue #20266 by Ryanglambert, these metrics are commonly used, for example, in facebook/mmf

shubhraneel added 3 commits

August 28, 2021 23:22


          Add precision_at_recall_k and recall_at_precision_k functions

e13de28


          Add tests

788527c


          Refactor code: no need to take list of tuples and avoid try block

95f25a6

github-actions bot added the module:metrics label

shubhraneel added 2 commits

August 29, 2021 08:39


          Add more tests and documentation

f143ee3


          Add changelog entry

c9cddf0

glemaitre reviewed

View reviewed changes

sklearn/metrics/__init__.py

                   "v_measure_score",
                   "zero_one_loss",
                   "brier_score_loss",
+                  "precision_at_recall_k",

Member

glemaitre Sep 1, 2021

I think that it would be better to call it max_precision_at_recall_k and max_recall_at_precision_k to make it obvious that this is the maximum that is taken.

Ryanglambert Dec 11, 2021

When I hear "precision_at_recall_k" I think of a single number singled out from the precision recall curve. (given a line, if I constrain X=x_i, then Y=y_i). If I agree with that logic then I think the original name is better to use.

glemaitre reviewed

View reviewed changes

sklearn/metrics/_classification.py



		def recall_at_precision_k(y_true, y_prob, k, *, pos_label=None, sample_weight=None):
		"""Computes maximum recall for the thresholds when precision is greater

Member

glemaitre Sep 1, 2021

We should make it fit on a single line:

Maximum recall for a precision greater than `k`.

glemaitre reviewed

View reviewed changes

Member

glemaitre left a comment

I made a partial review. I will give further comments a bit later.

sklearn/metrics/_classification.py

+                  true positives and ``fn`` the number of false negatives. The recall is
+                  intuitively the ability of the classifier to find all the positive samples.
+                  Read more in the :ref:`User Guide <precision_recall_f_measure_metrics>`.

Member

glemaitre Sep 1, 2021

We will need to add a section in the user guide documentation.
I think that we can add an example to show the meaning of the two metrics graphically on a precision-recall curve. We can then reuse the image of the example in the user guide.

sklearn/metrics/_classification.py

Comment on lines +2661 to +2668

+                  The precision is the ratio ``tp / (tp + fp)`` where ``tp`` is the number of
+                  true positives and ``fp`` the number of false positives. The precision is
+                  intuitively the ability of the classifier not to label as positive a sample
+                  that is negative.
+                  The recall is the ratio ``tp / (tp + fn)`` where ``tp`` is the number of
+                  true positives and ``fn`` the number of false negatives. The recall is
+                  intuitively the ability of the classifier to find all the positive samples.

Member

glemaitre Sep 1, 2021

I am thinking that we could avoid to repeat this description

sklearn/metrics/_classification.py

Comment on lines +2685 to +2686

		When ``pos_label=None``, if y_true is in {-1, 1} or {0, 1},
		``pos_label`` is set to 1, otherwise an error will be raised.

Member

glemaitre Sep 1, 2021

Suggested change

      
                    When ``pos_label=None``, if y_true is in {-1, 1} or {0, 1},
          
                    ``pos_label`` is set to 1, otherwise an error will be raised.
          
                    When `pos_label=None`, if y_true is in {-1, 1} or {0, 1},
          
                    `pos_label` is set to 1, otherwise an error will be raised.

sklearn/metrics/_classification.py

+                  Returns
+                  -------
+                  recall_at_precision_k : float
+                      Maximum recall when for the thresholds when precision is greater

Member

glemaitre Sep 1, 2021

There is something wrong with this sentence due to the twice "when"

sklearn/metrics/_classification.py

+                  See Also
+                  --------
+                  precision_recall_curve : Compute precision-recall curve.
+                  plot_precision_recall_curve : Plot Precision Recall Curve for binary

Member

glemaitre Sep 1, 2021

We should not link to the plot_precision_recall_curve because it will be deprecated soon.

sklearn/metrics/_classification.py

+                  precision_recall_curve : Compute precision-recall curve.
+                  plot_precision_recall_curve : Plot Precision Recall Curve for binary
+                      classifiers.
+                  PrecisionRecallDisplay : Precision Recall visualization.

Member

glemaitre Sep 1, 2021

In addition, we should add both the .from_estimator and .from_predictions method

sklearn/metrics/_classification.py

+                  >>> k = 0.75
+                  >>> recall_at_precision_k(y_true, y_prob, k)
+.0

Member

glemaitre Sep 1, 2021

You should remove this blank line.

sklearn/metrics/_classification.py

+                  >>> y_true = np.array([0, 0, 1, 1, 1, 1])
+                  >>> y_prob = np.array([0.1, 0.8, 0.9, 0.3, 1.0, 0.95])
+                  >>> k = 0.75
+                  >>> recall_at_precision_k(y_true, y_prob, k)

Member

glemaitre Sep 1, 2021

It might be better to take a threshold for which the score is not 1.0

glemaitre requested changes

View reviewed changes

Member

glemaitre left a comment

Just adding the flag "Request changes" to see that this PR has been reviewed

lorentzenchr mentioned this pull request

RFC Principled metrics for scoring and calibration of supervised learning #21718

Open

cmarmo added Stalled help wanted labels

This was referenced Oct 15, 2022

Precision @ Recall K || Recall @ Precision K #20266

Open

FEA implement max precision@recall K / recall@precision K #24671

Closed

cmarmo added Superseded and removed Stalled help wanted labels

Member

lorentzenchr commented Apr 29, 2023

I'm closing as it is superseeded by #24671. On top, we need a decision in #20266.

lorentzenchr closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:metrics Superseded