-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
Redundant execution of check_class_weight_balanced_linear_classifier #33154
Description
Describe the bug and give evidence about its user-facing impact
Describe the bug and give evidence about its user-facing impact In sklearn/utils/estimator_checks.py, the function _yield_classifier_checks yields check_class_weight_balanced_linear_classifier twice for linear classifiers that support the class_weight parameter.
This bug was discovered during a manual code review of estimator_checks.py. It affects any scikit-learn user or developer who uses check_estimator (or the underlying check generators) to validate estimators.
The impact is redundant test execution. For example, when running common tests for a LogisticRegression or a custom linear classifier using the mixin, this specific check (which involves fitting the estimator and verifying class weight handling) is executed twice. This is wasteful in terms of CI/CD time and can lead to duplicated error messages or logs, making debugging more tedious.
Steps/Code to Reproduce
from sklearn.linear_model import LogisticRegression
from sklearn.utils.estimator_checks import _yield_classifier_checks
def get_name(check):
return getattr(check, "name",
getattr(check, "func", check).name)
clf = LogisticRegression()
checks = list(_yield_classifier_checks(clf))
check_names = [get_name(c) for c in checks]
duplicate_count = check_names.count(
'check_class_weight_balanced_linear_classifier'
)
print(f"Yield count: {duplicate_count}")
Expected Results
The check should be yielded exactly once. Yield count: 1
Actual Results
The check is currently yielded twice. Yield count: 2
Versions
python: 3.14.0
numpy: 2.4.1
scipy: 1.17.0
joblib: 1.5.3
scikit-learn: 1.7.dev0 (from source)Interest in fixing the bug
I am interested in fixing this bug. I have already identified the root cause and implemented a fix.
Analysis of root cause: In _yield_classifier_checks, there are two consecutive if blocks that both check for isinstance(classifier, LinearClassifierMixin) and the presence of class_weight in parameters. Both blocks yield the same check.