-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
cross_val_predict returns bad prediction when evaluated on a dataset with very few samples #13366
Copy link
Copy link
Closed
Labels
Description
Description
cross_val_predict returns bad prediction when evaluated on a dataset with very few samples on 1 class, causing class being ignored on some CV splits.
Steps/Code to Reproduce
from sklearn.datasets import *
from sklearn.linear_model import *
from sklearn.model_selection import *
X, y = make_classification(n_samples=100, n_features=2, n_redundant=0, n_informative=2,
random_state=1, n_clusters_per_class=1)
# Change the first sample to a new class
y[0] = 2
clf = LogisticRegression()
cv = StratifiedKFold(n_splits=2, random_state=1)
train, test = list(cv.split(X, y))
yhat_proba = cross_val_predict(clf, X, y, cv=cv, method="predict_proba")
print(yhat_proba)Expected Results
[[0.06105412 0.93894588 0. ]
[0.92512247 0.07487753 0. ]
[0.93896471 0.06103529 0. ]
[0.04345507 0.95654493 0. ]
Actual Results
[[0. 0. 0. ]
[0. 0. 0. ]
[0. 0. 0. ]
[0. 0. 0. ]
Versions
Verified on the scikit latest dev version.
Reactions are currently unavailable