[MRG] add lobpcg svd_solver to PCA and TruncatedSVD#12319
[MRG] add lobpcg svd_solver to PCA and TruncatedSVD#12319lobpcg wants to merge 119 commits intoscikit-learn:mainfrom lobpcg:lobpcg_svd
Conversation
|
This pull request introduces 5 alerts when merging 539dfd4 into 63e5ae6 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
|
This pull request introduces 3 alerts when merging 2beaafc into 63e5ae6 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
|
This pull request introduces 3 alerts when merging 5375a48 into 63e5ae6 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
|
This pull request introduces 3 alerts when merging 17c9196 into 4e2e1fa - view on LGTM.com new alerts:
Comment posted by LGTM.com |
|
This pull request introduces 2 alerts when merging 3546217 into 5fd9e03 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
|
The development docs say: "The full scikit-learn tests can be run using ‘make’ in the root folder. Alternatively, running ‘pytest’ in a folder will run all the tests of the corresponding subpackages." This works for me: after building the package in-place |
|
This pull request introduces 2 alerts when merging d658176 into 5fd9e03 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
|
OK, I have your branch now. I am looking at the LGTM alerts. |
|
I am not sure how you want to proceed with working on this branch together, so here is my proposal:
You can add my repo by adding the following to your .git/config: and then just use |
|
OK, I accepted your invitation, and pushed my commit to this PR. Let's see :) |
|
This pull request introduces 1 alert when merging 7db19f8 into 03c3af5 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
|
OK, I will try debugging the errors (probably tomorrow). Concerning the single LGTM alert |
This means that |
|
Don't worry about the lgtm.com alerts
|
|
This pull request introduces 1 alert when merging d435213 into 00c2f41 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
|
This pull request introduces 1 alert when merging eea2083 into 00c2f41 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
|
This pull request introduces 1 alert when merging 3661ef5 into 39bd736 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
|
This pull request introduces 1 alert when merging 7f9ab5a into 39bd736 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
|
This pull request introduces 1 alert when merging 4c93c2f into 39bd736 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
|
This pull request introduces 1 alert when merging 11ef291 into 39bd736 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
|
@lobpcg I have fetched your changes, thanks. I do not really understand the |
|
This pull request introduces 1 alert when merging 2c2e5bb into 8985a63 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
|
@lobpcg in what way? Seems OK for me. BTW. I have fixed the docstring errors. |
|
I call git in two alternative ways, both GUI, locally and might have created a conflict locally.
I of course noticed, the fixes are way above my head, great work! All tests are passed now, except for LGTM, which are recommended to be ignored. There is nothing else trivial I can think of is left for us to do. I removed [WIP]. Should I put [MRG]? lobpcg.py is well improved, thanks to your editing. Do you want to PR it to scipy? Add to the existing scipy PR? BTW, in scipy/scipy#9275 I suggest adding lobpcg_svd to scipy, where it naturally belongs. It now can probably be just mostly copy/pasted from https://github.com/lobpcg/scikit-learn/blob/lobpcg_svd/sklearn/utils/extmath.py to scipy... |
The easiest way of fixing that is to fetch from github, and hard-reset your branch (master?) to the fetched one, in case you do not have local changes. If you have non-commited local changes, you can stash them first (
It might be easier to create a new PR, after scipy/scipy#9352 is merged. Of course, just in case scipy maintainers agree to merge it. I will ping them to get some opinions.
OK. |
|
I do not really understand the merge conflict - we just added a function, and the function logsumexp() below it was removed in the master - removing logsumexp() in this branch did not help resolving the problem. |
|
To my vague recollection, the cost/iteration should be m^2n+9mn^2+n^3 in LOBPCG vs probably something like m^2n+mn^2+n^3 in randomized. When m/n is large, the difference 8mn^2 is well dominated by the common first term m^2n, so the cost per iteration is nearly the same, while LOBPCG should always converge faster, often MUCH faster, than randomized, which may become visible even after only a few iterations. The speed improvement depends on the distribution of the singular values, not directly on m/n, that is why it should be inconsistent between different datasets. So the reported tests results for "large" m/n are as expected. Another consideration: "Direct solve" eigh uses n=m and takes m^3 (no iterations) in LAPACK. When m/n is small, like 10, and you need to iterate at least 10 times, it's much better just to run plain LAPACK eigh, so testing this PR on m/n<10 makes little practical sense. In fact, if m/n<5 LOBPCG algorithm does not even run - the lobpcg code simply switches inside to the standard LAPACK solver, moreover, does it very inefficiently, see scipy/scipy#10983, (the user is not expected to run The bottom line is that LOBPCG cannot lose to randomized, unless m/n is small, but then they both should lose to eigh |
|
scipy/scipy#9275 is now merged, adding lobpcg_svd to scipy |
|
@glemaitre or anybody else: I have just updated https://github.com/lobpcg/scikit-learn/tree/master from https://github.com/scikit-learn/scikit-learn I am now trying to update this branch https://github.com/lobpcg/scikit-learn/tree/lobpcg_svd from https://github.com/lobpcg/scikit-learn/tree/master but cannot due to Conflicting files which I do not know how to resolve. Could you please help to resolve the Conflicting files so that I can bring this branch up to date with https://github.com/lobpcg/scikit-learn/tree/master . I just use GitHub Desktop, not really familiar with git commands. |
|
@thomasjpfan -thanks very much for your help! There are some docstring errors/warnings that I still cannot figure out how to fix, But I guess they can be just ignored for now. |
|
Just chiming in to support this PR! |
|
@lobpcg If merged, this would work correctly for a sparse input into PCA (i.e. implicitly center the input), right? |
@dkobak Thanks for your support! Yes, this PR surely also works correctly for a sparse input into PCA, adding a new solver 'lobpcg' within scikit. However, it is an independent separate issue from and does not supersede the implicit centering the input like in #12794. |
|
@lobpcg this pr seems stalled. Do you need any help getting it merged? I'd be happy to contribute commits |
|
@zachmayer yes, thanks for your willingness to help! The known work remained is as follows:
Still interested to help getting it merged? |
|
@lobpcg I'll take a look. Could you elaborate on step 3? I may be able to at least help resolve the merge conflicts |
scipy/scipy#10830 has implemented LOBPCG solver in svds in scipy while this PR being stalled. Which gives yet another option to add lobpcg svd_solver to PCA and TruncatedSVD simply by calling https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.svds.html with solver=‘lobpcg’ |
|
Ahh, yeah calling the scipy solver with |
Yes, but one needs to consider that:
|
I think it's fine for a new feature to be only available for newer versions of the dependency as long as we raise a |
|
scipy/scipy#10830 has implemented LOBPCG solver in svds in scipy while this PR being stalled. Which allows using lobpcg svd_solver to PCA and TruncatedSVD simply by calling https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.svds.html with solver=‘lobpcg’ |
|
I think I am a bit confused here @lobpcg: the scipy PR was merged two years ago, so why are you only closing this PR now, two years later? In any case, thanks a lot for your efforts! |
|
I thought about coming back to it, factoring in the SciPy implementation, but never found time and then forgot about it. |
|
The comment of @ogrisel gives the way forward since this is merged in SciPy: #12319 (comment) Basically, we have two possibilities: either backport the method in
I am pretty upset when reading these types of comments that I find really indecent. |
|
I surely did not mean to upset anyone, so I apologize. |
Reference Issues/PRs
fixes #12079, fixes #12080
What does this implement/fix? Explain your changes.
#12079 adds LOBPCG as an SVD solver in PCA
#12080 adds LOBPCG solver to Truncated PCA
lobpcg_svd should also be useful in KernelPCA for faster partial decompositions, see #12068
This PR also includes multiple LOBPCG related bug fixes, including vendoring sklearn/externals/_lobpcg.py from scipy 1.3.0
Any other comments?
@ogrisel Transferred from permanently closed PR #12291
Keep in mind for testing, that lobpcg_svd falls back to dense eigensolver unless n_components < 3*matrix_size, where matrix_size = min (n_samples, n_features)
Still to do, better in new focused PRs after this one is merged
example plot_faces_decomposition may include lobpcg_svd, just change
('Eigenfaces - PCA using randomized SVD',
decomposition.PCA(n_components=n_components, svd_solver='randomized',
whiten=True),
True),
to
but lobpcg currently fails here for unclear numerical reasons. More testing may be needed for float32 data, like in this example.