Skip to content

[MRG] Support multi-label probability calibration#13060

Closed
connorbrinton wants to merge 1 commit intoscikit-learn:mainfrom
connorbrinton:calibration-multilabel-support
Closed

[MRG] Support multi-label probability calibration#13060
connorbrinton wants to merge 1 commit intoscikit-learn:mainfrom
connorbrinton:calibration-multilabel-support

Conversation

@connorbrinton
Copy link
Copy Markdown

CalibratedClassifierCV now handles the calibration process in such a way that probability estimates can be calibrated for multi-label targets. Also loosens input validation requirements to better interoperate with Pipeline.

Reference Issues/PRs

Fixes #8710.

What does this implement/fix? Explain your changes.

Changes include (roughly in source code order):

  • Looser input validation on arguments passed to wrapped classifiers (fixes CalibratedClassifierCV doesn't interact properly with Pipeline estimators  #8710)
  • Target classes and type are determined before cross-validation, rather than on each fold individually
  • Label predictions from CalibratedClassifierCV.predict are obtained using LabelBinarizer.inverse_transform, which supports multi-label predictions
  • Specialized logic in _CalibratedClassifier for handling binary classification problems is tidied and more thoroughly commented
  • Shape of uncalibrated estimates from wrapped classifier is checked against the expected shape in _CalibratedClassifier
  • Simplification of logic in _CalibratedClassifier.predict_proba along with more comments explaining what's happening
  • Tests for acceptance of 1D feature arrays as input and production of valid multi-label probability predictions

Any other comments?

Thanks for working on scikit-learn!

@connorbrinton connorbrinton changed the title [WIP] Support multi-label probability calibration [MRG] Support multi-label probability calibration Jan 28, 2019
@connorbrinton connorbrinton force-pushed the calibration-multilabel-support branch from fbc3813 to 51d2e38 Compare February 8, 2019 20:59
@connorbrinton
Copy link
Copy Markdown
Author

Hi @qinhanmin2014, would you be able to review this pull request? I see that you've recently approved PRs affecting this module. If it would be more appropriate for someone else to review it, please let me know. Thanks! 🙂

@cmarmo
Copy link
Copy Markdown
Contributor

cmarmo commented Jun 4, 2020

Hi @connorbrinton, I know it has been a while and I'm really sorry for that. Are you still interested in finalizing your work? If yes, do you mind synchronize with upstream? Thanks a lot for your patience!

@connorbrinton connorbrinton force-pushed the calibration-multilabel-support branch 2 times, most recently from 8729d24 to 797514c Compare June 10, 2020 22:07
@connorbrinton connorbrinton force-pushed the calibration-multilabel-support branch 12 times, most recently from 7c4dd0a to a30c895 Compare June 12, 2020 21:13
@connorbrinton
Copy link
Copy Markdown
Author

connorbrinton commented Jun 15, 2020

Hi @cmarmo, thanks for following up on this PR 🙂 I rebased with upstream and fixed all of the new failures that resulted, so this PR should be ready for review again 👍

@glemaitre Let me know if you have any questions or requests for this PR once you get the chance to review it 😄

There is tiny amount of overlap between this PR and #17546, but the changes seem to be mostly complementary. Differences include:

  • In ENH Support pipelines in CalibratedClassifierCV #17546, input validation and CV fold label checks are only performed when cv != "prefit"
  • In this PR, training input validation in this PR is loosened to allow non-numeric X (such as text input) and to not reshape 1D input to 2D (since the base estimator can do that, if needed)
  • This PR modifies the call to check_array in predict_proba to match validation performed during training. In ENH Support pipelines in CalibratedClassifierCV #17546 it might make sense for this validation to be conditioned on cv != "prefit", to be consistent with training.

I'd be happy to rebase on it once it's merged 😄

@cmarmo
Copy link
Copy Markdown
Contributor

cmarmo commented Jul 6, 2020

#17546 has been happily merged! 🚀
@connorbrinton do you mind resolving conflicts? Thanks for your patience.

@connorbrinton connorbrinton force-pushed the calibration-multilabel-support branch from a30c895 to f3e82e2 Compare July 6, 2020 14:58
@cmarmo
Copy link
Copy Markdown
Contributor

cmarmo commented Aug 24, 2020

@connorbrinton let us know if you need any help with the sync with upstream.

Base automatically changed from master to main January 22, 2021 10:50
@connorbrinton connorbrinton force-pushed the calibration-multilabel-support branch 3 times, most recently from 0c8c5e8 to cd45aaf Compare February 27, 2021 18:00
These changes loosen `CalibratedClassifierCV`'s input validation to
accept one-dimensional and non-numeric data (such as text).
@connorbrinton
Copy link
Copy Markdown
Author

I'm not planning on updating this PR any more. Some of the features implemented by this PR have been implemented elsewhere, and other features haven't seemed to garner interest from maintainers. It's been a long time since there's been any movement on this PR, so there's no reason to keep it open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CalibratedClassifierCV doesn't interact properly with Pipeline estimators

4 participants