FEA Add strategy isotonic to calibration curve by lorentzenchr · Pull Request #23824 · scikit-learn/scikit-learn

lorentzenchr · 2022-07-03T07:29:15Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This PR adds strategy="isotonic" to calibration_curve and CalibrationDisplay.

Any other comments?

Reliability diagrams with (PAV algorithm) isotonic regression is the CORP approach of (https://doi.org/10.1073/pnas.2016191118).

lorentzenchr · 2022-07-03T09:25:03Z

Results

From the example of CalibrationDisplay.

Details

import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.calibration import CalibrationDisplay


X, y = make_classification(random_state=0)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, random_state=0)
clf = LogisticRegression(random_state=0)
clf.fit(X_train, y_train)
LogisticRegression(random_state=0)
fig, ax = plt.subplots()
CalibrationDisplay.from_estimator(clf, X_test, y_test, ax=ax)
CalibrationDisplay.from_estimator(clf, X_test, y_test, ax=ax, strategy="isotonic")
ax.get_legend().get_texts()[1].set_text('LogisticRegression uniform')
ax.get_legend().get_texts()[2].set_text('LogisticRegression isotonic')

From https://scikit-learn.org/stable/auto_examples/calibration/plot_compare_calibration.html#calibration-curves.

Details

import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
import numpy as np
from sklearn.calibration import CalibrationDisplay
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import LinearSVC


X, y = make_classification(
    n_samples=100_000, n_features=20, n_informative=2, n_redundant=2, random_state=42
)

train_samples = 100  # Samples used for training the models
X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    shuffle=False,
    test_size=100_000 - train_samples,
)


class NaivelyCalibratedLinearSVC(LinearSVC):
    """LinearSVC with `predict_proba` method that naively scales
    `decision_function` output."""

    def fit(self, X, y):
        super().fit(X, y)
        df = self.decision_function(X)
        self.df_min_ = df.min()
        self.df_max_ = df.max()

    def predict_proba(self, X):
        """Min-max scale output of `decision_function` to [0,1]."""
        df = self.decision_function(X)
        calibrated_df = (df - self.df_min_) / (self.df_max_ - self.df_min_)
        proba_pos_class = np.clip(calibrated_df, 0, 1)
        proba_neg_class = 1 - proba_pos_class
        proba = np.c_[proba_neg_class, proba_pos_class]
        return proba


# Create classifiers
lr = LogisticRegression()
gnb = GaussianNB()
svc = NaivelyCalibratedLinearSVC(C=1.0)
rfc = RandomForestClassifier()

clf_list = [
    (lr, "Logistic"),
    (gnb, "Naive Bayes"),
    (svc, "SVC"),
    (rfc, "Random forest"),
]


fig = plt.figure(figsize=(10, 10))
gs = GridSpec(4, 2)
colors = plt.cm.get_cmap("Dark2")

ax_calibration_curve = fig.add_subplot(gs[:2, :2])
calibration_displays = {}
for i, (clf, name) in enumerate(clf_list):
    clf.fit(X_train, y_train)
    display = CalibrationDisplay.from_estimator(
        clf,
        X_test,
        y_test,
        n_bins=10,
        strategy="isotonic",
        name=name,
        ax=ax_calibration_curve,
        color=colors(i),
    )
    calibration_displays[name] = display

ax_calibration_curve.grid()
ax_calibration_curve.set_title("Calibration plots")
plt.show()

sklearn/calibration.py

lorentzenchr · 2022-08-25T15:50:18Z

@ogrisel @glemaitre You might be interested.

thomasjpfan

As you noted in #23132 (comment), the CORP paper does not meet our inclusion criterion. According to google scholar it has been cited 13 times.

If we can not include the method based on inclusion, then an alternative is to accept a callable here so it is simple to implement CORP:

def calibration_curve(...):
    ...
    elif callable(strategy):
        # n_bins to be flexible
        return strategy(y_prob, y_true, n_bins)

and strategy is:

def strategy(y_prob, y_true, n_bins):
    iso = IsotonicRegression(y_min=0, y_max=1).fit(y_prob, y_true)
    prob_true = iso.y_thresholds_
    prob_pred = iso.X_thresholds_
    return prob_true, prob_pred

Then we can update a calibration example to showcase passing a callable and using the CORP strategy.

lorentzenchr · 2022-08-25T18:54:26Z

@thomasjpfan The point is that isotonic regression is already included in scikit-learn, so why not use it? In particular, CalibratedClassifierCV is using it and is related to the same topic:
Another way of putting it: Plot CalibratedClassifierCV(clf, method="isotonic", cv="prefit").fit(X, y).predict(X) vs clf.prefit(X).
I see the paper more as a theoretical foundation as to why isotonic regression is good to use in reliability diagrams.

lorentzenchr · 2022-09-05T22:01:25Z

To give it more citation counts:

The same plots are in Figure 1, bottom line, of Alexandru Niculescu-Mizil & Rich Caruana (2005) "Predicting Good Probabilities With Supervised Learning".
Zadrozny & Elkan (2002) "Transforming classifier scores into accurate multiclass probability estimates"

lorentzenchr added 4 commits July 3, 2022 09:14

FEA add isotonic strategy to calibration curve

99d42c1

TST add test for isotonic calibration curve

c4be459

DOC add term conditional event probability

d2a7245

DOC add whatsnew

fead66d

lorentzenchr added Waiting for Reviewer module:calibration labels Jul 3, 2022

lucyleeow reviewed Jul 25, 2022

View reviewed changes

sklearn/calibration.py Show resolved Hide resolved

lorentzenchr added 3 commits August 10, 2022 21:30

DOC add versionadded

7d660fb

Merge branch 'main' into calibration_isotonic

8a80959

DOC fix merge conflict in whatsnew

34f77b7

lorentzenchr force-pushed the calibration_isotonic branch from aa0e0d6 to 34f77b7 Compare August 10, 2022 20:43

thomasjpfan reviewed Aug 25, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FEA Add strategy isotonic to calibration curve#23824

FEA Add strategy isotonic to calibration curve#23824
lorentzenchr wants to merge 7 commits intoscikit-learn:mainfrom
lorentzenchr:calibration_isotonic

lorentzenchr commented Jul 3, 2022

Uh oh!

lorentzenchr commented Jul 3, 2022

Uh oh!

Uh oh!

lorentzenchr commented Aug 25, 2022

Uh oh!

thomasjpfan left a comment

Uh oh!

lorentzenchr commented Aug 25, 2022

Uh oh!

lorentzenchr commented Sep 5, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

lorentzenchr commented Jul 3, 2022

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

lorentzenchr commented Jul 3, 2022

Results

Uh oh!

Uh oh!

lorentzenchr commented Aug 25, 2022

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

lorentzenchr commented Aug 25, 2022

Uh oh!

lorentzenchr commented Sep 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lorentzenchr commented Sep 5, 2022 •

edited

Loading