Skip to content

Fix a bug in _k_means.pyx, when KMeans.fit(X.T)#4507

Closed
zhaipro wants to merge 4 commits intoscikit-learn:masterfrom
zhaipro:bug_k_means
Closed

Fix a bug in _k_means.pyx, when KMeans.fit(X.T)#4507
zhaipro wants to merge 4 commits intoscikit-learn:masterfrom
zhaipro:bug_k_means

Conversation

@zhaipro
Copy link
Copy Markdown

@zhaipro zhaipro commented Apr 3, 2015

bad case:

X = np.array([[0, 0, 0], [0, 1, 1]])
estimator = KMeans(n_clusters=2, n_init=1, max_iter=20, \
                   precompute_distances=False)
estimator.fit(X.T)
print X.T
print estimator.labels_
print estimator.cluster_centers_
print estimator.n_iter_

bad case:
X = np.array([[0, 0, 0], [0, 1, 1]])
estimator = KMeans(n_clusters=2, n_init=1, max_iter=20, \
                   precompute_distances=False)
estimator.fit(X.T)
print X.T
print estimator.labels_
print estimator.cluster_centers_
print estimator.n_iter_
@amueller amueller added the Bug label Apr 3, 2015
@amueller
Copy link
Copy Markdown
Member

amueller commented Apr 3, 2015

Thanks for the fix.
Can you please add a non-regression test?

@amueller
Copy link
Copy Markdown
Member

amueller commented Apr 3, 2015

I didn't check in detail but couldn't there also be a problem if centers is fortran ordered?

@amueller
Copy link
Copy Markdown
Member

amueller commented Apr 3, 2015

We could also make sure that X is C contiguous, have you done any benchmarks by any chance?

@zhaipro
Copy link
Copy Markdown
Author

zhaipro commented Apr 4, 2015

We could also make sure that X is C contiguous, have you done any benchmarks by any chance?

Yes, when I use PCA.

X = np.array([[0, 0], [0, 1]])
pca = PCA()
new_X = pca.fit_transform(X)
X.strides
>>> (4, 8)
new_X.strides
>>> (8, 4)

@zhaipro
Copy link
Copy Markdown
Author

zhaipro commented Apr 4, 2015

I didn't check in detail but couldn't there also be a problem if centers is fortran ordered?

I think you are right. It also be a problem.

@landscape-bot
Copy link
Copy Markdown

Code Health
Repository health increased by 0.01% when pulling a6cf751 on zhaipro:bug_k_means into 6560a8e on scikit-learn:master.

@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Apr 6, 2015

Indeed this is bad bug. Could you please add a non-regression test on using toy fortran-aligned data by using np.asfortranarray?

@ogrisel ogrisel added this to the 0.16.1 milestone Apr 6, 2015
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: please use whitespaces around the / sign.

@zhaipro
Copy link
Copy Markdown
Author

zhaipro commented Apr 6, 2015

I will perfect it as soon as possible.

@amueller
Copy link
Copy Markdown
Member

amueller commented Apr 6, 2015

@ogrisel the thing is that we should probably add a common test for this, right?

@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Apr 6, 2015

+1 for a common test.

@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage increased (+0.0%) to 95.15% when pulling e77d36e on zhaipro:bug_k_means into 6560a8e on scikit-learn:master.

@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Apr 6, 2015

Can you please fix the random_state argument of KMeans in your test, e.g. random_state=42. That should not change anything on such a small dataset but it better to always fix the RNG in tests to make the failure reproducible and independent of other tests.

@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Apr 6, 2015

@zhaipro have you check that your new test fails when run against the code in the current master?

@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Apr 6, 2015

Could you also please squash your intermediate commits together, ideally keeping one for the FIX and one for the TST.

@zhaipro
Copy link
Copy Markdown
Author

zhaipro commented Apr 6, 2015

have you check that you test fails when run against the code in the current master?

@ogrisel Yes, I checked.

@amueller
Copy link
Copy Markdown
Member

amueller commented Apr 6, 2015

just force-push the squashed commits into your bug_k_means branch.

amueller added a commit that referenced this pull request Apr 8, 2015
[MRG + 1] squash commits together for #4507
@amueller
Copy link
Copy Markdown
Member

Fixed in #4531.

@amueller amueller closed this Apr 14, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants