[MRG] Support multi-label probability calibration by connorbrinton · Pull Request #13060 · scikit-learn/scikit-learn

connorbrinton · 2019-01-28T18:29:52Z

CalibratedClassifierCV now handles the calibration process in such a way that probability estimates can be calibrated for multi-label targets. Also loosens input validation requirements to better interoperate with Pipeline.

Reference Issues/PRs

Fixes #8710.

What does this implement/fix? Explain your changes.

Changes include (roughly in source code order):

Looser input validation on arguments passed to wrapped classifiers (fixes CalibratedClassifierCV doesn't interact properly with Pipeline estimators #8710)
Target classes and type are determined before cross-validation, rather than on each fold individually
Label predictions from CalibratedClassifierCV.predict are obtained using LabelBinarizer.inverse_transform, which supports multi-label predictions
Specialized logic in _CalibratedClassifier for handling binary classification problems is tidied and more thoroughly commented
Shape of uncalibrated estimates from wrapped classifier is checked against the expected shape in _CalibratedClassifier
Simplification of logic in _CalibratedClassifier.predict_proba along with more comments explaining what's happening
Tests for acceptance of 1D feature arrays as input and production of valid multi-label probability predictions

Any other comments?

Thanks for working on scikit-learn!

connorbrinton · 2019-02-08T22:33:37Z

Hi @qinhanmin2014, would you be able to review this pull request? I see that you've recently approved PRs affecting this module. If it would be more appropriate for someone else to review it, please let me know. Thanks! 🙂

cmarmo · 2020-06-04T09:29:22Z

Hi @connorbrinton, I know it has been a while and I'm really sorry for that. Are you still interested in finalizing your work? If yes, do you mind synchronize with upstream? Thanks a lot for your patience!

sklearn/calibration.py

connorbrinton · 2020-06-15T19:52:28Z

Hi @cmarmo, thanks for following up on this PR 🙂 I rebased with upstream and fixed all of the new failures that resulted, so this PR should be ready for review again 👍

@glemaitre Let me know if you have any questions or requests for this PR once you get the chance to review it 😄

There is tiny amount of overlap between this PR and #17546, but the changes seem to be mostly complementary. Differences include:

In ENH Support pipelines in CalibratedClassifierCV #17546, input validation and CV fold label checks are only performed when cv != "prefit"
In this PR, training input validation in this PR is loosened to allow non-numeric X (such as text input) and to not reshape 1D input to 2D (since the base estimator can do that, if needed)
This PR modifies the call to check_array in predict_proba to match validation performed during training. In ENH Support pipelines in CalibratedClassifierCV #17546 it might make sense for this validation to be conditioned on cv != "prefit", to be consistent with training.

I'd be happy to rebase on it once it's merged 😄

cmarmo · 2020-07-06T13:38:03Z

#17546 has been happily merged! 🚀
@connorbrinton do you mind resolving conflicts? Thanks for your patience.

cmarmo · 2020-08-24T19:46:14Z

@connorbrinton let us know if you need any help with the sync with upstream.

These changes loosen `CalibratedClassifierCV`'s input validation to accept one-dimensional and non-numeric data (such as text).

connorbrinton · 2023-05-04T17:34:50Z

I'm not planning on updating this PR any more. Some of the features implemented by this PR have been implemented elsewhere, and other features haven't seemed to garner interest from maintainers. It's been a long time since there's been any movement on this PR, so there's no reason to keep it open

connorbrinton changed the title ~~[WIP] Support multi-label probability calibration~~ [MRG] Support multi-label probability calibration Jan 28, 2019

connorbrinton force-pushed the calibration-multilabel-support branch from fbc3813 to 51d2e38 Compare February 8, 2019 20:59

connorbrinton mentioned this pull request Apr 1, 2019

[MRG] Allow nd array for CalibratedClassifierCV #13485

Merged

amueller added the Waiting for Reviewer label Aug 6, 2019

connorbrinton force-pushed the calibration-multilabel-support branch 2 times, most recently from 8729d24 to 797514c Compare June 10, 2020 22:07

cmarmo reviewed Jun 11, 2020

View reviewed changes

sklearn/calibration.py Outdated Show resolved Hide resolved

connorbrinton force-pushed the calibration-multilabel-support branch 12 times, most recently from 7c4dd0a to a30c895 Compare June 12, 2020 21:13

cmarmo mentioned this pull request Jun 15, 2020

ENH Support pipelines in CalibratedClassifierCV #17546

Merged

connorbrinton force-pushed the calibration-multilabel-support branch from a30c895 to f3e82e2 Compare July 6, 2020 14:58

Base automatically changed from master to main January 22, 2021 10:50

connorbrinton force-pushed the calibration-multilabel-support branch 3 times, most recently from 0c8c5e8 to cd45aaf Compare February 27, 2021 18:00

Loosen CalibratedClassifierCV input validation

91a348e

These changes loosen `CalibratedClassifierCV`'s input validation to accept one-dimensional and non-numeric data (such as text).

connorbrinton force-pushed the calibration-multilabel-support branch from cd45aaf to 91a348e Compare February 27, 2021 18:18

cmarmo added the module:calibration label Jan 24, 2022

cmarmo added Needs Decision Requires decision Stalled and removed Waiting for Reviewer labels Feb 14, 2022

connorbrinton closed this May 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MRG] Support multi-label probability calibration#13060

[MRG] Support multi-label probability calibration#13060
connorbrinton wants to merge 1 commit intoscikit-learn:mainfrom
connorbrinton:calibration-multilabel-support

connorbrinton commented Jan 28, 2019

Uh oh!

connorbrinton commented Feb 8, 2019

Uh oh!

cmarmo commented Jun 4, 2020

Uh oh!

Uh oh!

connorbrinton commented Jun 15, 2020 •

edited

Loading

Uh oh!

cmarmo commented Jul 6, 2020

Uh oh!

cmarmo commented Aug 24, 2020

Uh oh!

connorbrinton commented May 4, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

connorbrinton commented Jan 28, 2019

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

connorbrinton commented Feb 8, 2019

Uh oh!

cmarmo commented Jun 4, 2020

Uh oh!

Uh oh!

connorbrinton commented Jun 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmarmo commented Jul 6, 2020

Uh oh!

cmarmo commented Aug 24, 2020

Uh oh!

connorbrinton commented May 4, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

connorbrinton commented Jun 15, 2020 •

edited

Loading