BUG: Fix lobpcg() largest option#9352
Conversation
| def _assert_symmetric(M, rtol=1e-5, atol=1e-8): | ||
| assert_allclose(M.T.conj(), M, rtol=rtol, atol=atol) | ||
| # def _assert_symmetric(M, rtol=1e-5, atol=1e-8): | ||
| # assert_allclose(M.T.conj(), M, rtol=rtol, atol=atol) |
There was a problem hiding this comment.
Just to be sure--the fact that the math here is rather challenging justifies retaining an unused function as a comment? I see the mathematician providing this solution in the linked issue has suggested to comment out.
There was a problem hiding this comment.
This is a work-in-progress solution - the lines will be either removed or a warning will be issued. @lobpcg wants to collect more data on why np.float32 matrices sometimes fail, see the linked issue comment.
There was a problem hiding this comment.
You can add "WIP:" to the title to make it a bit more obvious
There was a problem hiding this comment.
Done, thanks for the tip!
|
These bug fixes are needed to pass the tests in scikit-learn/scikit-learn#12319 |
|
It is currently WIP so only would attract interested people. Is this ready for review? Regardless, I think scikit-learn is following the official releases and not the master branch. Hence it will have to wait the release in any case. |
|
It is WIP only because there are a few lines commented out, for some potential future attention. If it is a problem for the review, these lines can of course be just removed, or the tolerance values chosen so large that the 2 assertions always trivially pass. Yet another alternative is to put these 2 assertions inside scikit-learn test Not sure what it actually means. |
|
Yes, I think it is ready, except resolving what to do with those now commented-out symmetry checks. |
|
I have removed the WIP status. |
|
I don't know if it is relevant in the sparse case but in the dense case, the symmetry check is often done via import numpy as np
from scipy.linalg import norm
# Mat is the supposedly hermitian matrix
Antisym = Mat - Mat.conj().T
if norm(Antisym, 1) > max(np.spacing(10**a), (10**b)*norm(Mat, 1)):
raise ValueError('not symmetric enough')where If it is of critical importance it has to be added in and tuned to avoid future maintenance burden, otherwise better be taken out to reduce the clutter. |
- replace _assert_symmetric() with new _report_nonhermitian()
|
Thanks, @ilayn. So I have replaced the What do you think? |
|
@rc To be able to control and fix all the errors, I have made a local copy at https://github.com/lobpcg/scikit-learn/blob/lobpcg_svd/sklearn/utils/lobpcg.py for PR scikit-learn/scikit-learn#12319 We should try to keep in full sync with the original LOBPCG in sklearn is used to solve normal eigenvalue problems X'*X as called from https://github.com/lobpcg/scikit-learn/blob/lobpcg_svd/sklearn/utils/extmath.py function lobpcg_svd If you agree with this plan, please put WIP label back here, since merging may take some time and having a local copy of lobpcg at sklearn removes the urgency of this PR. What do you think? |
|
Yes, this code in SciPy was written well before the automatic style checking with CI tools and I was, IIRC, following the style/naming conventions you used in the matlab version - many things need to be style-corrected to adhere to PEP8. However, I would not mix the style update with this PR - it is IMHO better to have several smaller PRs than one large do-it-all kind - it is certainly easier for reviewers. So, if you agree, I would port this PR fixes to your version in scikit-learn, and I would not mark this WIP to get it merged as soon as we get the green light? The style and other updates then may come in other PRs. We also need to discuss with the SciPy developers and maintainers your broader plans on making the generic LOBPCG algorithm version with various backends - let us create a new issue for that? |
|
If small increments are better, we should surely put this PR through as is now, pending only requested review changes. For the next PR, I would propose making a PEP8 etc compliant lobpcg.py and may be adding more tests (I have not yet looked at scipy lobpcg tests...) I hope that I am getting close to PEP8 etc compliant lobpcg.py in https://github.com/lobpcg/scikit-learn/blob/lobpcg_svd/sklearn/utils/lobpcg.py for PR scikit-learn/scikit-learn#12319 If you agree, it would be grate if you could help with it, since some of the errors are way above my beginner Python knowledge. Having this done, we would have a solid base to discuss with the SciPy developers and maintainers the broader plans on making the generic LOBPCG algorithm version with various backends. |
|
OK. I have just forked scikit-learn - I will try fixing remaining style errors tomorrow. Let's move further discussion to scikit-learn/scikit-learn#12319. |
@rc Great, thanks! I stop making changes in lobpcg there, waiting for your fixes. Please merge the changes made for this PR there - I have not. |
|
@ilayn, @tylerjereddy, could you review, please? We have made additional improvements to the LOBPCG solver, but it is better to create a new focused PR instead of adding them to this one, right? |
| md = M - M.T.conj() | ||
|
|
||
| nmd = norm(md, 1) | ||
| tol = max(np.spacing(10**a), (10**b)*norm(M, 1)) |
There was a problem hiding this comment.
I made a mistake here this line should probably be
tol = np.spacing(max(10**a, (10**b)*norm(M, 1)))I am typing on the phone so I can't see all at once but the idea is simply to take the maximum of 10**a and some scaling * norm of the matrix and take the np.spacing of it.
However I don't know what does print statements do. Are you supposed to give a warning there?
There was a problem hiding this comment.
I will fix tol (if this function survives, see below), thanks!
It is not really a warning, this function is only meant to print info on potential non-symmetry when the verbosity level is non-zero. I have added it instead of the commented-out assertions. I am not sure about its usefulness, though, @lobpcg?
There was a problem hiding this comment.
In principle, it is useful to cheaply check the symmetry, especially since the matrices of the eigenproblem to solve can be passed as user-defined functions...
It is probably best to use one of the already existing functions, such as
https://docs.scipy.org/doc/numpy/reference/generated/numpy.testing.assert_allclose.html#numpy.testing.assert_allclose
The specific values of rtol and atol are tricky to choose well, since they depend on the data type, float32 vs foat64, the matrix size, and the algorithm used for the comparison... However, it is not so practically important now, since we currently put the check only for a non-default verbosityLevel.
@rc whatever changes you finally make here, please remember to sync with https://github.com/lobpcg/scikit-learn/blob/lobpcg_svd/sklearn/utils/lobpcg.py and check that it does not lead to formatting errors.
There was a problem hiding this comment.
Yes, choosing rtol and atol is tricky, that is why @ilayn proposed the way above.
|
|
||
| def test_eigs_consistency(): | ||
| n = 20 | ||
| vals = [np.arange(n, dtype=np.float64) + 1] |
There was a problem hiding this comment.
This can be done via
vals = np.arange(1, n+1, dtype=float)Similarly for n=5 below
There was a problem hiding this comment.
Thanks, that is what careless copy-pasting without reading does :)
|
I have tried to address the review comments, let us know what more needs to be done to get this PR merged. FYI: @lobpcg and me have more updates and significant fixes coming (now living in scikit-learn/scikit-learn#12319), that build on the fixes here. |
tylerjereddy
left a comment
There was a problem hiding this comment.
I added a few more minor comments -- maybe @ilayn can take one more look.
@pv opened the original issue -- if I'm not mistaken this solution is slightly different from the early-stage proposal of deprecating largest. Is that ok with you Pauli?
I see that scikit-learn is depending on the resolution of this PR. Good things here are the apparently minimal disruptions to any old tests, the fact that the CI is all green, and apparently a few experienced mathematicians have commented either here / in the linked issue.
| nmd = norm(md, 1) | ||
| tol = np.spacing(max(10**a, (10**b)*norm(M, 1))) | ||
| if nmd > tol: | ||
| print('matrix %s is not enough Hermitian for a=%d, b=%d:' |
There was a problem hiding this comment.
| print('matrix %s is not enough Hermitian for a=%d, b=%d:' | |
| print('matrix %s is not sufficiently Hermitian for a=%d, b=%d:' |
| _check_eigen(A, lvals20, lvecs20, atol=1e-3, rtol=0) | ||
| assert_allclose(vals20, lvals20, atol=1e-14) | ||
|
|
||
| # This tests the alternative branch using eigh(). |
There was a problem hiding this comment.
This could be a separate unit test for clarity maybe
There was a problem hiding this comment.
You mean the alternative branch test? The alternative branch means here the small matrix code path in lobpcg().
There was a problem hiding this comment.
Great, maybe add that as a comment if you split off to a second test or use parametrization--the two tests repeat a lot of the same logic.
| assert_allclose(vals5, lvals5, atol=1e-14) | ||
|
|
||
| def test_verbosity(): | ||
| """Check that nonzero verbosity level code runs. |
There was a problem hiding this comment.
so we're not asserting anything here--nothing gets printed because the matrices are sufficiently Hermitian?
There was a problem hiding this comment.
The idea behind this test was to run the code paths that are not normally run in the default (silent) operation - there are other print statements in lobpcg(). So yes, we are not asserting anything. And actually, for the current settings of a=3, b=-1 in _report_nonhermitian() calls the gramA matrix is reported non-Hermitian. This could be fixed either by tweaking the parameters, or removing the _report_nonhermitian() completely. A cleaner solution how to provide optional diagnostics to users might be to support callbacks at suitable places in lobpcg() - what do you think?
There was a problem hiding this comment.
I'm not sure I now enough about the issue to ask for any substantive additional changes--I might politely suggest maybe just asserting that you get an expected result when probing the verbosity handling, but maybe not crucial.
There was a problem hiding this comment.
I have changed the verbosity level to 11 to make sure really all related code is executed. Subsequently, I have fixed two bugs. The code runs now without errors and this is the expected result - I am not sure how to assert that in the test? :)
|
If I understand correctly, this PR changes what Can you also add |
|
@pv PR fixes both issues in #4174 for "largest=True":
The current unchanged default "largest=True" is also the default in ARPACK, making it consistent with lobpcg. There is no change for
|
|
Do as you see best. However, the backward incompatible change in what
`lobpcg(A, x)` returns (larges eigenvalues by default, not the
smallest) must be mentioned in release notes, i.e. added to this page
after this PR is merged:
https://github.com/scipy/scipy/wiki/Release-note-entries-for-SciPy-1.2.0
|
|
@pv Thanks for your comments! Just to clarify - lobpcg has always returned the largest eigenvalues by default. This has not changed in this PR, so it is fully backward compatible. But the results for
|
| _check_eigen(A, lvals20, lvecs20, atol=1e-3, rtol=0) | ||
| assert_allclose(vals20, lvals20, atol=1e-14) | ||
|
|
||
| # This tests the alternative branch using eigh(). |
There was a problem hiding this comment.
Great, maybe add that as a comment if you split off to a second test or use parametrization--the two tests repeat a lot of the same logic.
| assert_allclose(vals5, lvals5, atol=1e-14) | ||
|
|
||
| def test_verbosity(): | ||
| """Check that nonzero verbosity level code runs. |
There was a problem hiding this comment.
I'm not sure I now enough about the issue to ask for any substantive additional changes--I might politely suggest maybe just asserting that you get an expected result when probing the verbosity handling, but maybe not crucial.
| eigs,vecs = lobpcg(A, X, B=B, tol=1e-5, maxiter=30, largest=False, | ||
| verbosityLevel=1) | ||
| eigs,vecs = lobpcg(A, X, B=B, tol=1e-5, maxiter=30, largest=False, | ||
| verbosityLevel=10) |
There was a problem hiding this comment.
The lines are identical apart from the verbosityLevel. I'm tempted to suggest parametrization here too, but I know some people don't like that unless there are > 2 cases.
- remove pause() and its use - remove non-existent precision argument in save()
|
@tylerjereddy I have now some spare time to work on this - let me know if there are some additional issues. |
|
@rc Maybe if you have a minute adding a concise release note to https://github.com/scipy/scipy/wiki/Release-note-entries-for-SciPy-1.2.0 could be helpful |
|
@tylerjereddy Thanks for merging, I have added a paragraph to the release notes. |
|
#4174 can be closed now. |
|
done |
This PR applies fixes proposed by @lobpcg to lobpcg(), and addresses #4174.