FIX make sure OutputCodeClassifier rejects complex labels by ogrisel · Pull Request #20219 · scikit-learn/scikit-learn

ogrisel · 2021-06-07T13:54:41Z

This is a quick fix for the random failure reported in #20218 after the merge of #20192. I am not sure about the root cause of the randomness but we indeed need to factorize y validation when X is not validated in a meta-estimator as @jeremiedbb suggested elsewhere.

In the mean time, let's make sure that the CI can be green in main to avoid disturbing concurrent PRs.

ogrisel · 2021-06-07T14:44:27Z

BTW, I could not reproduce the original failure on my machine. I am not sure why the nested call to LogisticRegression._validate_data would not have raised the required ValueError in the first place for this estimator check.

glemaitre · 2021-06-07T14:47:07Z

sklearn/multiclass.py

        """
        y = column_or_1d(y, warn=True)
        _assert_all_finite(y)
+        _ensure_no_complex_data(y)


Is check_X_y doing it. We might have forgotten to add this check in some other PR of @jeremiedbb

No it's not. It's just that the _ConstantPredictor does no check at all. There's a discussion in #20205

Actually we don't, but we probably should:

https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/utils/validation.py#L883-L890

No it's not. It's just that the _ConstantPredictor does no check at all. There's a discussion in #20205

But why does this test pass most of the time and fails randomly? I could not reproduce the failure on my machine. Also I don't understand why one would get the constant predictor to kick in, in this case. The RNG seems properly seeded, so one should always get the validation of X by LogisticRegression._validate_data.

Actually, no we do not set the random_state in check_complex_data on the estimator clone. That does not explain why this test does not fail for me though, but that is a bug on its own. I will submit another PR.

ogrisel · 2021-06-07T15:42:49Z

Closing in favor of #20221 (hopefully).

FIX make sure OutputCodeClassifier reject complex labels

9fd7e46

ogrisel added the Bug label Jun 7, 2021

ogrisel changed the title ~~FIX make sure OutputCodeClassifier reject complex labels~~ FIX make sure OutputCodeClassifier rejects complex labels Jun 7, 2021

ogrisel requested review from glemaitre and jeremiedbb June 7, 2021 14:44

glemaitre reviewed Jun 7, 2021

View reviewed changes

ogrisel mentioned this pull request Jun 7, 2021

FIX make check_complex_data deterministic #20221

Merged

ogrisel closed this Jun 7, 2021

ogrisel deleted the fix-complex-data-outputcodeclassifier branch June 7, 2021 15:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FIX make sure OutputCodeClassifier rejects complex labels#20219

FIX make sure OutputCodeClassifier rejects complex labels#20219
ogrisel wants to merge 1 commit intoscikit-learn:mainfrom
ogrisel:fix-complex-data-outputcodeclassifier

ogrisel commented Jun 7, 2021

Uh oh!

ogrisel commented Jun 7, 2021

Uh oh!

glemaitre Jun 7, 2021

Uh oh!

jeremiedbb Jun 7, 2021

Uh oh!

ogrisel Jun 7, 2021

Uh oh!

ogrisel Jun 7, 2021

Uh oh!

ogrisel Jun 7, 2021

Uh oh!

ogrisel commented Jun 7, 2021 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

ogrisel commented Jun 7, 2021

Uh oh!

ogrisel commented Jun 7, 2021

Uh oh!

glemaitre Jun 7, 2021

Choose a reason for hiding this comment

Uh oh!

jeremiedbb Jun 7, 2021

Choose a reason for hiding this comment

Uh oh!

ogrisel Jun 7, 2021

Choose a reason for hiding this comment

Uh oh!

ogrisel Jun 7, 2021

Choose a reason for hiding this comment

Uh oh!

ogrisel Jun 7, 2021

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Jun 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ogrisel commented Jun 7, 2021 •

edited

Loading