FEA Add strategy isotonic to calibration curve#23824
FEA Add strategy isotonic to calibration curve#23824lorentzenchr wants to merge 7 commits intoscikit-learn:mainfrom
Conversation
ResultsFrom the example of Detailsimport matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.calibration import CalibrationDisplay
X, y = make_classification(random_state=0)
X_train, X_test, y_train, y_test = train_test_split(
X, y, random_state=0)
clf = LogisticRegression(random_state=0)
clf.fit(X_train, y_train)
LogisticRegression(random_state=0)
fig, ax = plt.subplots()
CalibrationDisplay.from_estimator(clf, X_test, y_test, ax=ax)
CalibrationDisplay.from_estimator(clf, X_test, y_test, ax=ax, strategy="isotonic")
ax.get_legend().get_texts()[1].set_text('LogisticRegression uniform')
ax.get_legend().get_texts()[2].set_text('LogisticRegression isotonic')From https://scikit-learn.org/stable/auto_examples/calibration/plot_compare_calibration.html#calibration-curves. Detailsimport matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
import numpy as np
from sklearn.calibration import CalibrationDisplay
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import LinearSVC
X, y = make_classification(
n_samples=100_000, n_features=20, n_informative=2, n_redundant=2, random_state=42
)
train_samples = 100 # Samples used for training the models
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
shuffle=False,
test_size=100_000 - train_samples,
)
class NaivelyCalibratedLinearSVC(LinearSVC):
"""LinearSVC with `predict_proba` method that naively scales
`decision_function` output."""
def fit(self, X, y):
super().fit(X, y)
df = self.decision_function(X)
self.df_min_ = df.min()
self.df_max_ = df.max()
def predict_proba(self, X):
"""Min-max scale output of `decision_function` to [0,1]."""
df = self.decision_function(X)
calibrated_df = (df - self.df_min_) / (self.df_max_ - self.df_min_)
proba_pos_class = np.clip(calibrated_df, 0, 1)
proba_neg_class = 1 - proba_pos_class
proba = np.c_[proba_neg_class, proba_pos_class]
return proba
# Create classifiers
lr = LogisticRegression()
gnb = GaussianNB()
svc = NaivelyCalibratedLinearSVC(C=1.0)
rfc = RandomForestClassifier()
clf_list = [
(lr, "Logistic"),
(gnb, "Naive Bayes"),
(svc, "SVC"),
(rfc, "Random forest"),
]
fig = plt.figure(figsize=(10, 10))
gs = GridSpec(4, 2)
colors = plt.cm.get_cmap("Dark2")
ax_calibration_curve = fig.add_subplot(gs[:2, :2])
calibration_displays = {}
for i, (clf, name) in enumerate(clf_list):
clf.fit(X_train, y_train)
display = CalibrationDisplay.from_estimator(
clf,
X_test,
y_test,
n_bins=10,
strategy="isotonic",
name=name,
ax=ax_calibration_curve,
color=colors(i),
)
calibration_displays[name] = display
ax_calibration_curve.grid()
ax_calibration_curve.set_title("Calibration plots")
plt.show() |
aa0e0d6 to
34f77b7
Compare
|
@ogrisel @glemaitre You might be interested. |
thomasjpfan
left a comment
There was a problem hiding this comment.
As you noted in #23132 (comment), the CORP paper does not meet our inclusion criterion. According to google scholar it has been cited 13 times.
If we can not include the method based on inclusion, then an alternative is to accept a callable here so it is simple to implement CORP:
def calibration_curve(...):
...
elif callable(strategy):
# n_bins to be flexible
return strategy(y_prob, y_true, n_bins)and strategy is:
def strategy(y_prob, y_true, n_bins):
iso = IsotonicRegression(y_min=0, y_max=1).fit(y_prob, y_true)
prob_true = iso.y_thresholds_
prob_pred = iso.X_thresholds_
return prob_true, prob_predThen we can update a calibration example to showcase passing a callable and using the CORP strategy.
|
@thomasjpfan The point is that isotonic regression is already included in scikit-learn, so why not use it? In particular, |
|
To give it more citation counts:
|


Reference Issues/PRs
Fixes #23132.
What does this implement/fix? Explain your changes.
This PR adds
strategy="isotonic"tocalibration_curveandCalibrationDisplay.Any other comments?
Reliability diagrams with (PAV algorithm) isotonic regression is the CORP approach of (https://doi.org/10.1073/pnas.2016191118).