[MRG+3] Collapsing PCA and RandomizedPCA#5299
Conversation
b66eebf to
c6b93f0
Compare
|
I have the feeling that |
|
_BasePCA was originally introduced at the same time as IncrementalPCA with On Wed, Sep 23, 2015 at 9:10 AM, Olivier Grisel notifications@github.com
|
|
Ah ok sorry for the confusion. I thought it was about the old |
|
Inheritance from |
141f858 to
bf6416a
Compare
|
We have to decide what to do with this behaviour
This paragraph was in PCA's docstring. RandomizedPCA handled it calling To note: I have been working on the performance of |
08f736f to
669f136
Compare
sklearn/decomposition/pca.py
Outdated
a62d139 to
9bb6750
Compare
|
I agreed with @ogrisel to call |
9bb6750 to
b8a42a9
Compare
|
Simple runtime benchmark script Output: This support the chosen |
|
|
sklearn/decomposition/pca.py
Outdated
b8a42a9 to
e7c9b36
Compare
|
instead of 'full' I would use 'lapack'. |
sklearn/decomposition/pca.py
Outdated
There was a problem hiding this comment.
Better test:
if isinstance(n_components, six.string_types):
In retrospect, "full" is a good name but it would be worth mentioning that this is using the standard SVD solver from LAPACK in the docstring. |
sklearn/decomposition/pca.py
Outdated
There was a problem hiding this comment.
It should be:
if n_components >=1 and n_components < .8 * min(X.shape):instead. n_components in [0, 1] means using the explained variance ratio to select the actual number of components once a full SVD has been performed.
I opened numpy issue for the scipy-dev-wheels failure: numpy/numpy#7403. Not sure about the Python3.5 failure, maybe a temporary glitch? |
In particular this doesn't really make sense: since now mkl is available by default and doesn't require a licence ... |
|
The AppVeyor failure for 32bit Python looks genuine AFAICT: |
|
@giorgiop Better to use np.allclose or assert_array_almost)equal then :P |
|
@giorgiop Can you rebase over master? |
Ahah! Reverting ... |
|
Merged !!!! |
|
Fantastic. Great team effort! |
|
Fantastic. Great team effort!
Whoooot!!!!!! Thank you, in the name of global warming, thank you for
your contribution to making everybody's code run faster.
|
|
Hm by making |
|
You are right. We hadn't thought of that at the time! |
|
Also, could it be that the resulting components can now have a different sign because |
|
No, I meant that with the introduction of import numpy as np
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
np.random.seed(1)
x = np.random.rand(10, 5000)
x[::2] += 0.5 * x[::2] + 2 * np.random.rand(5, 5000)
x[-1, :] *= 10
pca = PCA()
pca.fit(x.T)
plt.imshow(pca.components_, interpolation="nearest")
plt.title("pca.components_ sklearn master")Apparently, rows 4, 5, and 9 have been flipped in the new PCA version. I think it is really bad when results suddenly change without prior notice. Especially for such a run-of-the-mill method like PCA. |
|
It is a bit unfortunate but I think it will be hard for the results to stay exactly the same. @cbrnr can you point to where the svd_flip is now called where it wasn't before? @giorgiop why was that for consistency between the different solvers? If a different solver is chosen, we can not really expect the exact same result, though a close result would be good. @cbrnr I would argue that relying on the signs of eigenvectors is a really bad idea, and if you expect them to stay the same, you run into trouble either way. Without the sign flip, the outcome is random any how (created by minor choices in the SVD solver). Why did this come up? [I totally got bitten by this when writing my book btw] |
|
@cbrnr I would argue that relying on the signs of eigenvectors is a really bad
idea, and if you expect them to stay the same, you run into trouble either way.
Without the sign flip, the outcome is random any how (created by minor choices
in the SVD solver).
+1. It's in the range of numerical instabilities.
|
|
@amueller I'm not saying that the current solution using |
|
I'm just saying that it is not good practise that results of a standard
algorithm suddenly change without prior notice, that's all (I spent
some time figuring out why I suddenly got different results even though
I hadn't changed anything in my code).
While I very strongly agree that results of algorithms should not be
changed without warning from a release to another, it this specific case,
the sign of the PCA is undetermined. It might flip from one computer to
another, or when changing linear algebra underlying libraries.
Anyway, maybe a warning or a note in the next release would be helpful
(preferably one that is displayed when using PCA).
Once again, I find that a warning here would not be a good thing, because
anyhow, the result was undefined previously. So people would be getting
warnings about undefined behavior.
That said, a note in the release notes explaining the problem would be
good. Do you want to do a pull request? It's in doc/whats_new.rst
|
|
Yes, you're right, in this case the results were indeed undetermined. I'll work on the PR explaining the problem in the docs. |
|
I'll work on the PR explaining the problem in the docs.
Thanks, that's extremely useful!
|
|
maybe a note in the API changes section (maybe that section needs a better title - i feel it should be "things you need to be careful about when updating" - only shorter ;) |
|
@cbrnr and I think we all agree that unexpected changes are always bad. We try our best to avoid them. |
|
@amueller Maybe "Important changes" instead of API changes? Anyway, I'll add something to this section. Also, I didn't want to complain. Of course such (and much worse) things happen and I'm happy to contribute! |
| M = M.T | ||
|
|
||
| # Adjust n_iter. 7 was found a good compromise for PCA. See #5299 | ||
| if n_components < .1 * min(M.shape) and n_iter < 7: |
There was a problem hiding this comment.
This behavior is really strange. I'm trying to fix it in #7311.
|
I don't understand why running SVD twice on the same matrix can lead to principal components with signs flipped. |


Fixes #5243 and #3930.
to do
svd_solver=='arpack'autopolicyIncrementalPCAby inheritance from_BasePCA; see [MRG+2] Incremental PCA #3285addwe flip by default, without introducing a param for controlling the behaviourflip_signparam, True by default