[MRG] MNT requires_y tag with y=None validation#16622
Merged
jnothman merged 106 commits intoscikit-learn:masterfrom Apr 22, 2020
Merged
[MRG] MNT requires_y tag with y=None validation#16622jnothman merged 106 commits intoscikit-learn:masterfrom
jnothman merged 106 commits intoscikit-learn:masterfrom
Conversation
…ranch 'master' of github.com:scikit-learn/scikit-learn into n_features_in
Member
|
It can work out, but making it neat would take more work than I could put
in at the time. It frustrated me that some classes inherit from
RegressorMixin despite being optionally unsupervised. Admittedly the whole
idea of being optionally supervised in cross decomposition invalidates the
point of this tag.
|
Member
Author
Do you feel the same about a |
Member
No, that more explicitly matches what the tag is used to ensure. |
Member
Author
|
The tag is now |
Co-Authored-By: Adrin Jalali <adrin.jalali@gmail.com>
…n into is_supervised_tag
adrinjalali
approved these changes
Apr 20, 2020
agramfort
approved these changes
Apr 20, 2020
Contributor
|
Hi @scikit-learn/core-devs, time to merge this one? |
Member
|
+1 on my side
maybe @ogrisel or @GaelVaroquaux want to have a look?
… |
Member
Wonderful. More precise/explicit, and I think also a shorter PR. Thanks so much for your efforts, @NicolasHug. Only remaining issue: should the new tag be mentioned in what's new? |
Member
|
Thanks @NicolasHug |
Member
|
thanks heaps @NicolasHug |
gio8tisu
pushed a commit
to gio8tisu/scikit-learn
that referenced
this pull request
May 15, 2020
viclafargue
pushed a commit
to viclafargue/scikit-learn
that referenced
this pull request
Jun 26, 2020
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow up to #16112
EDIT: the tag is now
requires_yThis PR adds a
is_supervisedtag and uses it to raise an error in_validate_data()if:a non-supervised estimator is passed y!=NoneThis PR is mostly motivated by the fact that since #16112, supervised estimators that are passed y=None will fail with a tuple unpacking error instead of a nicer error message.
This comes with a bunch of complications:
This forces supervised estimators to call
_validate_data(X, y), instead of validating X and y separately. Since callingcheck_array(X, ...)and thencheck_array(y, ...)isn't in general equivalent to callingcheck_X_y(X, y, ...), I had to introduce a way to check X and y separately when calling_validate_data(X, y, ...). This is really ugly. It will also definitely not work for third-party estimators that inherit from ours.Some estimators like OneClassSVM, IsolationForest, and RandomTreesEmbedding are unsupervised, but they generate a randomythat is passed down to their base classes and validated there with_validate_data(X, y). This makes_validate_datafail because of the check mentioned above. So we need custom checks in the validation of the base classes in each of these. This is really, really ugly.Since
_validate_dataassumes the tag exists, this means that all estimators usingvalidate_datamust also support the tag. IMO, that's fine.