[MRG] run check_estimator on meta-estimators#9741
[MRG] run check_estimator on meta-estimators#9741amueller wants to merge 19 commits intoscikit-learn:mainfrom
Conversation
|
some rebase :-/ I'll fix the history once #9716 is merged. |
|
I guess we need a tag to skip the "modify init parameter" check as long as pipeline and feature union violate the API? |
remove outdated comment fix also for FeatureUnion [MRG+2] Limiting n_components by both n_features and n_samples instead of just n_features (Recreated PR) (scikit-learn#8742) [MRG+1] Remove hard dependency on nose (scikit-learn#9670) MAINT Stop vendoring sphinx-gallery (scikit-learn#9403) CI upgrade travis to run on new numpy release (scikit-learn#9096) CI Make it possible to run doctests in .rst files with pytest (scikit-learn#9697) * doc/datasets/conftest.py to implement the equivalent of nose fixtures * add conftest.py in root folder to ensure that sklearn local folder is used rather than the package in site-packages * test doc with pytest in Travis * move custom_data_home definition from nose fixture to .rst file [MRG+1] avoid integer overflow by using floats for matthews_corrcoef (scikit-learn#9693) * Fix bug#9622: avoid integer overflow by using floats for matthews_corrcoef * matthews_corrcoef: cosmetic change requested by jnothman * Add test_matthews_corrcoef_overflow for Bug#9622 * test_matthews_corrcoef_overflow: clean-up and make deterministic * matthews_corrcoef: pass dtype=np.float64 to sum & trace instead of using astype * test_matthews_corrcoef_overflow: add simple deterministic tests TST Platform independent hash collision tests in FeatureHasher (scikit-learn#9710) TST More informative error message in test_preserve_trustworthiness_approximately (scikit-learn#9738) add some rudimentary tests for meta-estimators fix extra whitespace in error message add missing if_delegate_has_method in pipeline don't test tuple pipeline for now only copy list if not list already? doesn't seem to help?
5cb5323 to
307c360
Compare
|
@jnothman do you have an opinion on what to do about the "modifying init param in fit" failure for pipelines? Should we create a work-around or start the worst deprecation cycle yet and introduce |
|
I don't think we can avoid the modifying init param in fit failure for a
while.
The deprecation is a 1.0 sort of an issue. As I've said before, it's best
enabled by a future parameter, rather than a straightforward deprecation.
But first we have to be sure we can handle all common use-cases for
pipelines in an intuitive way. Pipeline(other_pipeline.steps[:-1]) is
currently straightforward whether or not other_pipeline is fitted, for
example.
…On 20 September 2017 at 04:18, Andreas Mueller ***@***.***> wrote:
@jnothman <https://github.com/jnothman> do you have an opinion on what to
do about the "modifying init param in fit" failure for pipelines? Should we
create a work-around or start the worst deprecation cycle yet and introduce
steps_?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9741 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6xd38TWn3t5gk3K8hCBWQewVbdlCks5skAV6gaJpZM4PVBwz>
.
|
|
Ok, so skip that test? I'll go back to finishing up the tags then ;) |
Depends if you see that as a bug of a feature. If one passes an estimator to a pipeline, naively I would expect that estimator to be fitted when the pipeline is fit, as it happening now. Not have a some separate cloned instance created in the pipeline. At least that's what happens in DL libraries I think and it makes more sense to me, aside the fact that it doesn't respect the scikit-learn API contract. Was that contract designed with pipelines in view though? |
|
I think there's a separate issue to track that, #8157. Maybe it makes sense to figure out a short-term solution here for the tests? |
# Conflicts: # sklearn/pipeline.py # sklearn/utils/estimator_checks.py
|
Ok so I have no idea how to make |
|
I'm hacking around the pipeline cloning issue here. The missing value tags are a bit of an issue and generally defining tags for pipelines will be tricky, but this is a start. I didn't add ColumnTransformer yet, because there are too many issues. |
|
ping @NicolasHug |
| # grid-search | ||
| GridSearchCV(LogisticRegression(), {'C': [0.1, 1]}, cv=2), | ||
| # will fail tragically | ||
| # make_pipeline(StandardScaler(), None) |
There was a problem hiding this comment.
Is this failing because of how None or 'passthrough' does not support _get_tags?
There was a problem hiding this comment.
there were multiple issues
|
the validation and ducktyping in pipeline is a mess. But I think adding at least some tests is good... |
Agreed. |
|
I was just reminded that |
|
I would expect the following to work as expected: from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
X, y = make_classification(random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y,
random_state=0)
pipe = Pipeline([('scaler', StandardScaler()), ('svc', SVC())])The unfitted pipeline will error. from sklearn.utils.validation import check_is_fitted
check_is_fitted(pipe, "n_features_in_")NotFittedError: This Pipeline instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.And the following will not error. pipe.fit(X, y)
check_is_fitted(pipe, "n_features_in_") |
|
Our new |
|
I am confused regarding this topic. It is coming from time to time and I don't now recall what are we supposed to do (probably we don't know :)). From what I recall, we basically don't know what to validate in a I was under the impression that @adrinjalali do you recall the latest news regarding this topic? |
|
To me, a I'd just set a |
maybe |
|
So should I just submit a PR setting some random attribute in pipeline for check_is_fitted to work? |
I would be happy to review. It could be nice to have a review of @jnothman that always think about some side effects that I am not aware of. |
Let's go for it. I run into the pipeline +
|
|
The other PR only checks for |
|
Indeed sorry. It needs an update though. Let's reopen not to forget. |
|
@adrinjalali We already run the common tests on many meta estimators: scikit-learn/sklearn/tests/test_common.py Line 150 in 006ccdb scikit-learn/sklearn/tests/test_common.py Line 358 in 006ccdb I think this PR can be closed. |
Fixes #9443.
After #9716 I think it's time to add these checks.