ENH/FIX add drop_intermediate to DET curve and add threshold at infinity#29151
ENH/FIX add drop_intermediate to DET curve and add threshold at infinity#29151lorentzenchr merged 36 commits intoscikit-learn:mainfrom
Conversation
❌ Linting issuesThis PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling You can see the details of the linting issues under the
|
|
@ArturoAmorQ Could you also place the written-out name "Detection error tradeoff" (DET) in the docstrings of |
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
ogrisel
left a comment
There was a problem hiding this comment.
Thanks for the PR, here is some feedback.
|
Should I add a change log? |
ogrisel
left a comment
There was a problem hiding this comment.
Here is another pass. Please include a changelog entry to document the change in the array attributes of the display object.
|
On top of the changelog entry, we might want to also include a EDIT: actually the attributes are class constructor parameters so maybe it's a bit weird, even though the typical user will not pass the constructor parameters but instead use a factory method. |
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
ogrisel
left a comment
There was a problem hiding this comment.
A final comment but otherwise still LGTM.
sklearn/metrics/_ranking.py
Outdated
| ----- | ||
| .. versionchanged:: 1.6 | ||
| An arbritrary threshold at infinity is added to represent a classifier | ||
| that always predicts the negative class, i.e. `fpr=0` and `fnr=1`. |
There was a problem hiding this comment.
I would rather move this below the .. versionadded:: 0.24 after the main paragraph of the docstring instead of adding a new "Notes" section to this docstring.
There was a problem hiding this comment.
I think that we should make this information consistent in both det_curve and roc_curve.
I might change the roc_curve in this PR as well.
There was a problem hiding this comment.
I'd say a new PR. (this one takes long enough already.)
glemaitre
left a comment
There was a problem hiding this comment.
Uhm it seems I did not send this review.
sklearn/metrics/_ranking.py
Outdated
| ----- | ||
| .. versionchanged:: 1.6 | ||
| An arbritrary threshold at infinity is added to represent a classifier | ||
| that always predicts the negative class, i.e. `fpr=0` and `fnr=1`. |
There was a problem hiding this comment.
I think that we should make this information consistent in both det_curve and roc_curve.
I might change the roc_curve in this PR as well.
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
|
I added support to |
ogrisel
left a comment
There was a problem hiding this comment.
Edit: Adding a threshold at infinity lead to redundant thresholds, so we decided to include a drop_intermediate option as implemented in precision_recall_curve and roc_curve.
It seems that adding the drop_intermediate option is not strictly required for this PR: currently it is disabled in the display class and the example still renders fine.
I am not opposed to adding this option as part of this PR, but then it should be exposed in the .from_estimator and .from_predictions methods of the display class (and probably enabled by default, both for the sake of matplotlib rendering efficiency and consistency with other displays).
This new drop_intermediate option should also be documented as an enh in a new changelog entry and the tested when called from the display class methods.
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
|
|
||
| class DetCurveDisplay(_BinaryClassifierCurveDisplayMixin): | ||
| """DET curve visualization. | ||
| """Detection Error Tradeoff (DET) curve visualization. |
There was a problem hiding this comment.
Same as def det_curve: Could you fix the below link to the user guide to https://scikit-learn.org/stable/modules/model_evaluation.html#detection-error-tradeoff-det?
There was a problem hiding this comment.
Done for DetCurveDisplay and it's methods. The cross-references to the user guide in in the RocCurveDisplay and in PrecisionRecallDisplay point to https://scikit-learn.org/stable/visualizations.html, shall we change those as well in another PR?
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
|
2 follow-up PRs should be opened:
|
| drop_intermediate : bool, default=True | ||
| Whether to drop thresholds where true positives (tp) do not change | ||
| from the previous or subsequent threshold. All points with the same | ||
| tp value have the same `fnr` and thus same y coordinate. |
There was a problem hiding this comment.
@ArturoAmorQ could you please open a follow-up PR to add the missing ..versionadded markers for the new public parameters?
| Decreasing score values. | ||
| Decreasing thresholds on the decision function (either `predict_proba` | ||
| or `decision_function`) used to compute FPR and FNR. An arbitrary | ||
| threshold at infinity is added for the case `fpr=0` and `fnr=1`. |
There was a problem hiding this comment.
Maybe this is worth adding a versionchanged directive.
…ity (scikit-learn#29151) Co-authored-by: ArturoAmorQ <arturo.amor-quiroz@polytechnique.edu> Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Reference Issues/PRs
Follow-up from #24818. See also #30352.
What does this implement/fix? Explain your changes.
This PR implements the same logic used in #26194 for setting a threshold at infinity. Then uses it to show the chance level in our current example on DET curves.
Edit: Adding a threshold at infinity lead to redundant thresholds, so we decided to include a
drop_intermediateoption as implemented inprecision_recall_curveandroc_curve.It also takes the opportunity to set a
random_statein theRandomForestClassifierfor deterministic plots.Any other comments?
Maybe in a follow-up we can implement the
plot_chance_leveloption as introduced in #25929.