[MRG+1] FIX Pipelined fitting of Clustering algorithms, scoring of K-Means in pipelines#4064
Conversation
|
Okay, I guess that's one option; the alternative is to not pass |
|
I am just now a bit confused. |
|
we should do it consistently, but I guess it is to late for that now? |
|
I actually prefer the |
import numpy as np,sklearn
from sklearn.pipeline import Pipeline
from sklearn.cluster import MeanShift
X=np.random.rand(100,2)
p=Pipeline([('clf',MeanShift())])
p.fit(X)breaks and makes me sad. |
|
For the pipeline, it could be that previous steps in the pipeline use |
|
Don't be sad. As Gaël likes to say: it's not version 1 yet, stuff's allowed I can't say I have the answer here, yet, but I think you're thinking about On 8 January 2015 at 13:35, Andreas Mueller notifications@github.com
|
|
Yeah, I think this is the only way to make pipeline work as expected. Testing |
9f2ec74 to
f89c72f
Compare
f89c72f to
d55d272
Compare
|
reviews on this one would be nice as it makes testing somewhat simpler. |
sklearn/utils/estimator_checks.py
Outdated
There was a problem hiding this comment.
You need to add a warnings.simplefilter('always') here so that the registry doesn't catch some warnings and create heisenbugs... Better, decorate the whole test with @ignore_warnings
There was a problem hiding this comment.
that is not particular to this test, though, right? and I'm not sure the whole tests should be decorated. These are more for the deprecation warnings.
There was a problem hiding this comment.
I think it should be reasonable to ignore all sklearn warnings, at least, in common tests. Perhaps we should change ignore_warnings to only apply to sklearn modules.
There was a problem hiding this comment.
Interestingly enough, if I put @ignore_warnings on the test (not the check function) not test is run :-/
There was a problem hiding this comment.
I thought about adding a test for partial_fit, but I'm not sure about it. The pipelines need to look quite a bit different to support partial_fit, and there is no pressing reason currently to enforce taking a y in the API....
27ddff9 to
f02335a
Compare
|
Firstly, doc/developers/index.rst needs fixing. Secondly, I think this change should encompass |
|
What is the benefit in |
Consistency with |
|
Fair enough. |
|
Actually, there was a comment about |
|
Oh, I didn't look at it closely enough. It actually gives an invocation On 29 January 2015 at 10:32, Andreas Mueller notifications@github.com
|
|
Obscurely, it shows 1-ary On 29 January 2015 at 14:05, Joel Nothman joel.nothman@sydney.edu.au
|
There was a problem hiding this comment.
Indeed, this doc was already here. I guess I only looked at a glance at what I thought was an issue, but not really.
|
Great. This LGTM apart from your test failure, which I take it will be fixed before the next reviewer gives a +1 ;) |
|
Thanks for the review. Damn, I thought I just fixed travis. Will look at it again. |
e37f56a to
fa184fe
Compare
|
Fixed. |
|
Any other reviews? @ogrisel maybe? |
|
As the motivation of this change was to make it possible to call the clustering models in pipelines I would like to integrate a non-regression "integration" test in the spirit of #4064 (comment) in |
|
Other than that +1 on my side. |
|
Hum, I totally wanted to do that integration test with pipelines, seems I forgot about it. Not sure what should be tested, probably "everything". This PR has tests for the "API definition' of Pipeline, not doing the integration. |
|
I was thinking of a bunch of sanity checks such as: and the similar clustering and regressors and maybe others as long as they implement |
|
I thought about which functions we want to test, I guess that this issue only exists for |
|
|
cfee0a1 to
a105327
Compare
|
added integration test, rebased. I am a bit confused. That y is never used anywhere and not required by any API, is it? |
…f clustering algorithms!
a105327 to
cd63acc
Compare
|
Anyhow, this PR is good to go, I think. I removed the requirement on |
|
Great, merging this PR. Thanks again @amueller ! |
[MRG+1] FIX Pipelined fitting of Clustering algorithms, scoring of K-Means in pipelines
:) Rest assured this fix is in accordance with what's been in the docs for some time... |
Fixes #4063.
Also fixes the embarrassing bug where clustering algorithms can't be used in pipelines.