Skip to content

learning_curve - small dataset with CV - handle case when few positive examples. #7370

@julienaubert

Description

@julienaubert

Description

I am using learning_curve.py on an imbalanced dataset where there are few positives.

I ran into issues since for smaller training sizes, an exception will be raised and will stop generating the learning curve. (Even if it would have been successful once the training size increased). It is not trivial to manually set the right training size and in practice what that would result in is a curve where the score is 0 for sizes below that training size. The same thing can very simply be accomplished by setting error_score=0 instead.

However learning_curve does not take that parameter, and so when it calls _fit_and_score without that, _fit_and_score will default to error_score='raise'.

I am proposing to add "error_score='raise'" as a parameter to learning_curve which it propagates to _fit_and_score. I am new to scikit-learn so before I submit a PR... is this reasonable?

Steps/Code to Reproduce

import numpy as np
from sklearn.cross_validation import StratifiedShuffleSplit
from sklearn.svm import LinearSVC
from sklearn.metrics import make_scorer
from sklearn.metrics import f1_score
from sklearn.learning_curve import learning_curve


X = np.random.rand(5, 2)
y = np.array([0, 0, 1, 1, 0])
f1_score_label = make_scorer(f1_score, pos_label=1)
cv = StratifiedShuffleSplit(y, n_iter=10, test_size=0.25, random_state=0)
estimator = LinearSVC()
train_sizes_ratio = np.linspace(.1, 1.0, 5)

train_sizes, train_scores, test_scores = learning_curve(
        estimator, X, y, cv=cv, n_jobs=1, scoring=f1_score_label,
        train_sizes=train_sizes_ratio, error_score=0)

Expected Results

Actual Results

Versions

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions