BUG: Fix lobpcg() largest option by rc · Pull Request #9352 · scipy/scipy

rc · 2018-10-04T11:42:34Z

This PR applies fixes proposed by @lobpcg to lobpcg(), and addresses #4174.

- fix proposed by Andrew Knyazev

tylerjereddy · 2018-10-05T02:18:05Z

scipy/sparse/linalg/eigen/lobpcg/lobpcg.py

-def _assert_symmetric(M, rtol=1e-5, atol=1e-8):
-    assert_allclose(M.T.conj(), M, rtol=rtol, atol=atol)
+# def _assert_symmetric(M, rtol=1e-5, atol=1e-8):
+#     assert_allclose(M.T.conj(), M, rtol=rtol, atol=atol)


Just to be sure--the fact that the math here is rather challenging justifies retaining an unused function as a comment? I see the mathematician providing this solution in the linked issue has suggested to comment out.

This is a work-in-progress solution - the lines will be either removed or a warning will be issued. @lobpcg wants to collect more data on why np.float32 matrices sometimes fail, see the linked issue comment.

You can add "WIP:" to the title to make it a bit more obvious

Done, thanks for the tip!

lobpcg · 2018-10-07T16:29:50Z

These bug fixes are needed to pass the tests in scikit-learn/scikit-learn#12319
@ilayn Is there a tentative plan/timeline for the actual merge of this PR, so that I plan my scikit-learn contribution accordingly?

ilayn · 2018-10-07T16:57:54Z

It is currently WIP so only would attract interested people. Is this ready for review?

Regardless, I think scikit-learn is following the official releases and not the master branch. Hence it will have to wait the release in any case.

lobpcg · 2018-10-07T18:52:47Z

It is WIP only because there are a few lines commented out, for some potential future attention. If it is a problem for the review, these lines can of course be just removed, or the tolerance values chosen so large that the 2 assertions always trivially pass. Yet another alternative is to put these 2 assertions inside
if verbosityLevel > 10:

scikit-learn test
https://ci.appveyor.com/project/sklearn-ci/scikit-learn/builds/19323001/job/f30rwvudp8vu2wyn
ran today uses
https://files.pythonhosted.org/packages/c4/f3/752fd6778a9d07fddb2b02dac5895287e594d2d0d156a2a422c710f6a851/scipy-1.1.0-cp37-none-win_amd64.whl

Not sure what it actually means.

rc · 2018-10-07T18:57:36Z

Yes, I think it is ready, except resolving what to do with those now commented-out symmetry checks.

rc · 2018-10-07T19:00:15Z

I have removed the WIP status.

ilayn · 2018-10-08T03:10:52Z

I don't know if it is relevant in the sparse case but in the dense case, the symmetry check is often done via

import numpy as np
from scipy.linalg import norm

# Mat is the supposedly hermitian matrix
Antisym = Mat - Mat.conj().T

if norm(Antisym, 1) > max(np.spacing(10**a), (10**b)*norm(Mat, 1)):
    raise ValueError('not symmetric enough')

where a, b are typically around 3 and -1. These quantities are, in my personal experience, tricky to tune via rtol and atol since they are dependent on the matrix data hence the norm based code.

If it is of critical importance it has to be added in and tuned to avoid future maintenance burden, otherwise better be taken out to reduce the clutter.

- replace _assert_symmetric() with new _report_nonhermitian()

rc · 2018-10-08T13:59:45Z

Thanks, @ilayn.

So I have replaced the _assert_symmetric() with new _report_nonhermitian(), that gets called for a nonzero verbosity, and only reports the nonhermitian gram matrices, instead of raising an exception. In this way, a user might get more info in case of non-convergence. I have also created a new test test_verbosity() that call lobpcg() with non-zero verbosity levels. For the moment, it just uses the elastic rod matrices.

What do you think?

lobpcg · 2018-10-08T15:17:05Z

@rc To be able to control and fix all the errors, I have made a local copy at https://github.com/lobpcg/scikit-learn/blob/lobpcg_svd/sklearn/utils/lobpcg.py for PR scikit-learn/scikit-learn#12319

We should try to keep in full sync with the original
from scipy.sparse.linalg import lobpcg

LOBPCG in sklearn is used to solve normal eigenvalue problems X'*X as called from https://github.com/lobpcg/scikit-learn/blob/lobpcg_svd/sklearn/utils/extmath.py function lobpcg_svd
It goes through multiple tests performing PCA and truncated_svd, and has multiple failures, I am fixing. It appears that Travis CI and continuous-integration/appveyor checks may follow stricter standards there (e.g., lines 79 vs, 80 characters long), giving many formatting errors from lobpcg.py, which I am fixing. In these fixes, I have deviated quite a bit now from the original, which is bad. And I am still getting formatting errors from lobpcg.py so the editing is not yet finished... Could you please diff this original vs. https://github.com/lobpcg/scikit-learn/blob/lobpcg_svd/sklearn/utils/lobpcg.py and try to merge so we have a single version that passes all tests both here and at sklearn? My original did not include the latest changes you have made in this PR, so they need to be merged. I solved for the smallest of -X'*X in lobpcg_svd to go around the trouble with the largest.

If you agree with this plan, please put WIP label back here, since merging may take some time and having a local copy of lobpcg at sklearn removes the urgency of this PR.

What do you think?

rc · 2018-10-08T19:38:18Z

Yes, this code in SciPy was written well before the automatic style checking with CI tools and I was, IIRC, following the style/naming conventions you used in the matlab version - many things need to be style-corrected to adhere to PEP8.

However, I would not mix the style update with this PR - it is IMHO better to have several smaller PRs than one large do-it-all kind - it is certainly easier for reviewers. So, if you agree, I would port this PR fixes to your version in scikit-learn, and I would not mark this WIP to get it merged as soon as we get the green light? The style and other updates then may come in other PRs.

We also need to discuss with the SciPy developers and maintainers your broader plans on making the generic LOBPCG algorithm version with various backends - let us create a new issue for that?

lobpcg · 2018-10-08T19:54:41Z

If small increments are better, we should surely put this PR through as is now, pending only requested review changes.

For the next PR, I would propose making a PEP8 etc compliant lobpcg.py and may be adding more tests (I have not yet looked at scipy lobpcg tests...) I hope that I am getting close to PEP8 etc compliant lobpcg.py in https://github.com/lobpcg/scikit-learn/blob/lobpcg_svd/sklearn/utils/lobpcg.py for PR scikit-learn/scikit-learn#12319 If you agree, it would be grate if you could help with it, since some of the errors are way above my beginner Python knowledge. Having this done, we would have a solid base to discuss with the SciPy developers and maintainers the broader plans on making the generic LOBPCG algorithm version with various backends.

rc · 2018-10-08T20:04:37Z

OK. I have just forked scikit-learn - I will try fixing remaining style errors tomorrow. Let's move further discussion to scikit-learn/scikit-learn#12319.

lobpcg · 2018-10-08T20:10:58Z

OK. I have just forked scikit-learn - I will try fixing remaining style errors tomorrow. Let's move further discussion to scikit-learn/scikit-learn#12319.

@rc Great, thanks! I stop making changes in lobpcg there, waiting for your fixes. Please merge the changes made for this PR there - I have not.

rc · 2018-10-12T07:29:01Z

@ilayn, @tylerjereddy, could you review, please? We have made additional improvements to the LOBPCG solver, but it is better to create a new focused PR instead of adding them to this one, right?

ilayn · 2018-10-12T11:53:26Z

scipy/sparse/linalg/eigen/lobpcg/lobpcg.py

+    md = M - M.T.conj()

+    nmd = norm(md, 1)
+    tol = max(np.spacing(10**a), (10**b)*norm(M, 1))


I made a mistake here this line should probably be

tol = np.spacing(max(10**a, (10**b)*norm(M, 1)))

I am typing on the phone so I can't see all at once but the idea is simply to take the maximum of 10**a and some scaling * norm of the matrix and take the np.spacing of it.

However I don't know what does print statements do. Are you supposed to give a warning there?

I will fix tol (if this function survives, see below), thanks!

It is not really a warning, this function is only meant to print info on potential non-symmetry when the verbosity level is non-zero. I have added it instead of the commented-out assertions. I am not sure about its usefulness, though, @lobpcg?

In principle, it is useful to cheaply check the symmetry, especially since the matrices of the eigenproblem to solve can be passed as user-defined functions...

It is probably best to use one of the already existing functions, such as
https://docs.scipy.org/doc/numpy/reference/generated/numpy.testing.assert_allclose.html#numpy.testing.assert_allclose

The specific values of rtol and atol are tricky to choose well, since they depend on the data type, float32 vs foat64, the matrix size, and the algorithm used for the comparison... However, it is not so practically important now, since we currently put the check only for a non-default verbosityLevel.

@rc whatever changes you finally make here, please remember to sync with https://github.com/lobpcg/scikit-learn/blob/lobpcg_svd/sklearn/utils/lobpcg.py and check that it does not lead to formatting errors.

Yes, choosing rtol and atol is tricky, that is why @ilayn proposed the way above.

ilayn · 2018-10-12T11:58:17Z

scipy/sparse/linalg/eigen/lobpcg/tests/test_lobpcg.py

+
+def test_eigs_consistency():
+    n = 20
+    vals = [np.arange(n, dtype=np.float64) + 1]


This can be done via

vals = np.arange(1, n+1, dtype=float)

Similarly for n=5 below

Thanks, that is what careless copy-pasting without reading does :)

rc · 2018-10-22T20:40:21Z

I have tried to address the review comments, let us know what more needs to be done to get this PR merged.

FYI: @lobpcg and me have more updates and significant fixes coming (now living in scikit-learn/scikit-learn#12319), that build on the fixes here.

tylerjereddy

I added a few more minor comments -- maybe @ilayn can take one more look.

@pv opened the original issue -- if I'm not mistaken this solution is slightly different from the early-stage proposal of deprecating largest. Is that ok with you Pauli?

I see that scikit-learn is depending on the resolution of this PR. Good things here are the apparently minimal disruptions to any old tests, the fact that the CI is all green, and apparently a few experienced mathematicians have commented either here / in the linked issue.

tylerjereddy · 2018-10-23T05:01:10Z

scipy/sparse/linalg/eigen/lobpcg/lobpcg.py

+    nmd = norm(md, 1)
+    tol = np.spacing(max(10**a, (10**b)*norm(M, 1)))
+    if nmd > tol:
+        print('matrix %s is not enough Hermitian for a=%d, b=%d:'


Suggested change

print('matrix %s is not enough Hermitian for a=%d, b=%d:'

print('matrix %s is not sufficiently Hermitian for a=%d, b=%d:'

tylerjereddy · 2018-10-23T05:11:09Z

scipy/sparse/linalg/eigen/lobpcg/tests/test_lobpcg.py

+    _check_eigen(A, lvals20, lvecs20, atol=1e-3, rtol=0)
+    assert_allclose(vals20, lvals20, atol=1e-14)
+
+    # This tests the alternative branch using eigh().


This could be a separate unit test for clarity maybe

You mean the alternative branch test? The alternative branch means here the small matrix code path in lobpcg().

Great, maybe add that as a comment if you split off to a second test or use parametrization--the two tests repeat a lot of the same logic.

tylerjereddy · 2018-10-23T05:12:24Z

scipy/sparse/linalg/eigen/lobpcg/tests/test_lobpcg.py

+    assert_allclose(vals5, lvals5, atol=1e-14)
+
+def test_verbosity():
+    """Check that nonzero verbosity level code runs.


so we're not asserting anything here--nothing gets printed because the matrices are sufficiently Hermitian?

The idea behind this test was to run the code paths that are not normally run in the default (silent) operation - there are other print statements in lobpcg(). So yes, we are not asserting anything. And actually, for the current settings of a=3, b=-1 in _report_nonhermitian() calls the gramA matrix is reported non-Hermitian. This could be fixed either by tweaking the parameters, or removing the _report_nonhermitian() completely. A cleaner solution how to provide optional diagnostics to users might be to support callbacks at suitable places in lobpcg() - what do you think?

I'm not sure I now enough about the issue to ask for any substantive additional changes--I might politely suggest maybe just asserting that you get an expected result when probing the verbosity handling, but maybe not crucial.

I have changed the verbosity level to 11 to make sure really all related code is executed. Subsequently, I have fixed two bugs. The code runs now without errors and this is the expected result - I am not sure how to assert that in the test? :)

pv · 2018-10-23T19:10:07Z

If I understand correctly, this PR changes what lobpcg returns if the largest option is not specified. If that's right, I'd suggest changing the default value to largest=False (which was effectively the previous default value) --- the results probably will be in different order, but at least from the same end of the spectrum as previously.

Can you also add largest=False to test_eigs_consistency. Granted, it's tested in the other tests, but those consider more complex test matrices.

lobpcg · 2018-10-23T19:36:03Z

@pv PR fixes both issues in #4174 for "largest=True":

finding the correct, largest, eigenvalues for all matrix sizes - a mistake in the code, now fixed.
outputting the largest in a "natural", descending order, just as in the ARPACK solver

The current unchanged default "largest=True" is also the default in ARPACK, making it consistent with lobpcg.

There is no change for largest=False that returns the smallest in ascending order, also consistent with ARPACK. Full consistency of lobpcg and ARPACK minimizes user confusion and makes simple writing comparative tests, e.g., already coded for PCA and partial SVD in scikit-learn/scikit-learn#12319

If I understand correctly, this PR changes what lobpcg returns if the largest option is not specified. If that's right, I'd suggest changing the default value to largest=False (which was effectively the previous default value) --- the results probably will be in different order, but at least from the same end of the spectrum as previously.

Can you also add largest=False to test_eigs_consistency. Granted, it's tested in the other tests, but those consider more complex test matrices.

pv · 2018-10-23T20:13:37Z

Do as you see best. However, the backward incompatible change in what `lobpcg(A, x)` returns (larges eigenvalues by default, not the smallest) must be mentioned in release notes, i.e. added to this page after this PR is merged: https://github.com/scipy/scipy/wiki/Release-note-entries-for-SciPy-1.2.0

lobpcg · 2018-10-23T21:23:34Z

@pv Thanks for your comments! Just to clarify - lobpcg has always returned the largest eigenvalues by default. This has not changed in this PR, so it is fully backward compatible. But the results for largest=True (default or not, either way) were wrong before this PR, so hopefully nobody used it this way. This mistake is now fixed in this PR. And the old work-around to compute the smallest of the negative matrix by explicitly setting largest=False also works, no change. I agree that the release notes must mention that the largest=True (default) option is now fully operational and that the default and the output format in lobpcg are now consistent with that of ARPACK.

Do as you see best. However, the backward incompatible change in what lobpcg(A, x) returns (larges eigenvalues by default, not the smallest) must be mentioned in release notes, i.e. added to this page after this PR is merged: https://github.com/scipy/scipy/wiki/Release-note-entries-for-SciPy-1.2.0

tylerjereddy · 2018-10-28T23:38:11Z

scipy/sparse/linalg/eigen/lobpcg/tests/test_lobpcg.py

+    _check_eigen(A, lvals20, lvecs20, atol=1e-3, rtol=0)
+    assert_allclose(vals20, lvals20, atol=1e-14)
+
+    # This tests the alternative branch using eigh().


Great, maybe add that as a comment if you split off to a second test or use parametrization--the two tests repeat a lot of the same logic.

tylerjereddy · 2018-10-28T23:41:30Z

scipy/sparse/linalg/eigen/lobpcg/tests/test_lobpcg.py

+    assert_allclose(vals5, lvals5, atol=1e-14)
+
+def test_verbosity():
+    """Check that nonzero verbosity level code runs.


I'm not sure I now enough about the issue to ask for any substantive additional changes--I might politely suggest maybe just asserting that you get an expected result when probing the verbosity handling, but maybe not crucial.

tylerjereddy · 2018-10-28T23:44:11Z

scipy/sparse/linalg/eigen/lobpcg/tests/test_lobpcg.py

+    eigs,vecs = lobpcg(A, X, B=B, tol=1e-5, maxiter=30, largest=False,
+                       verbosityLevel=1)
+    eigs,vecs = lobpcg(A, X, B=B, tol=1e-5, maxiter=30, largest=False,
+                       verbosityLevel=10)


The lines are identical apart from the verbosityLevel. I'm tempted to suggest parametrization here too, but I know some people don't like that unless there are > 2 cases.

…rbosity()

- remove pause() and its use - remove non-existent precision argument in save()

rc · 2018-11-05T14:10:58Z

@tylerjereddy I have now some spare time to work on this - let me know if there are some additional issues.

tylerjereddy

Ok, this has been through a few rounds of review and there's a fair bit of expert commenting.

CI is all green & Pauli's backward-compatibility concerns seem to be addressed.

Thanks @rc and @lobpcg

tylerjereddy · 2018-11-05T18:08:10Z

@rc Maybe if you have a minute adding a concise release note to https://github.com/scipy/scipy/wiki/Release-note-entries-for-SciPy-1.2.0 could be helpful

rc · 2018-11-05T19:50:12Z

@tylerjereddy Thanks for merging, I have added a paragraph to the release notes.

rc · 2018-11-05T19:51:47Z

#4174 can be closed now.

ilayn · 2018-11-05T19:56:46Z

done

rc added 4 commits October 4, 2018 13:35

BUG: fix 'largest' option in lobpcg(), new _get_indx() helper

72d2da2

- fix proposed by Andrew Knyazev

WIP: comment out symmetry assertions

6a7a5f8

fix largest order in _get_indx(), simplify

4d44c21

MAINT: update LOBPCG tests

a6cda42

rc mentioned this pull request Oct 4, 2018

lobpcg "largest" option invalid? #4174

Closed

This was referenced Oct 4, 2018

new feature: add LOBPCG as an SVD solver in PCA scikit-learn/scikit-learn#12079

Open

[Closed] adding clusterQR to spectral clustering, and LOBPCG as an SVD solver to PCA and Truncated PCA scikit-learn/scikit-learn#12291

Closed

tylerjereddy added the scipy.sparse.linalg label Oct 5, 2018

tylerjereddy reviewed Oct 5, 2018

View reviewed changes

rc changed the title ~~Fix lobpcg() largest option~~ WIP: Fix lobpcg() largest option Oct 5, 2018

rc added 3 commits October 5, 2018 14:58

TST: new test_eigs_consistency()

2bebb33

STY: clean up

feebd9a

ENH: make lobpcg() output with largest compatible with eigs()

c194429

lobpcg mentioned this pull request Oct 7, 2018

[MRG] add lobpcg svd_solver to PCA and TruncatedSVD scikit-learn/scikit-learn#12319

Closed

rc changed the title ~~WIP: Fix lobpcg() largest option~~ Fix lobpcg() largest option Oct 7, 2018

rc added 2 commits October 8, 2018 15:53

ENH: report nonhermitian gram matrices when nonzere verbosity level

6639668

- replace _assert_symmetric() with new _report_nonhermitian()

TST: new test_verbosity()

1f54b41

ilayn reviewed Oct 12, 2018

View reviewed changes

rc added 2 commits October 12, 2018 14:27

FIX: use NumPy properly

b792e74

FIX: correct tol in _report_nonhermitian()

eb548a3

tylerjereddy reviewed Oct 23, 2018

View reviewed changes

tylerjereddy reviewed Oct 28, 2018

View reviewed changes

rc added 4 commits October 29, 2018 21:56

improve message printed in _report_nonhermitian()

1b3071a

TST: parameterize test_eigs_consistency()

bebf3e7

TST: make sure all verbosityLevel > 0 messages are printed in test_ve…

53f215c

…rbosity()

TST: make test_verbosity() pass

bcf0737

- remove pause() and its use - remove non-existent precision argument in save()

tylerjereddy changed the title ~~Fix lobpcg() largest option~~ BUG: Fix lobpcg() largest option Nov 5, 2018

tylerjereddy added the defect A clear bug or issue that prevents SciPy from being installed or used as expected label Nov 5, 2018

tylerjereddy approved these changes Nov 5, 2018

View reviewed changes

tylerjereddy merged commit 43193b9 into scipy:master Nov 5, 2018

tylerjereddy mentioned this pull request Nov 7, 2018

TST: test_eigs_consistency() doesn't have consistent results #9453

Closed

rc mentioned this pull request Nov 9, 2018

BUG: sparse.linalg.eigs: sorting behavior reliability #9460

Closed

	print('matrix %s is not enough Hermitian for a=%d, b=%d:'
	print('matrix %s is not sufficiently Hermitian for a=%d, b=%d:'

Uh oh!

Conversation

rc commented Oct 4, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lobpcg commented Oct 7, 2018

Uh oh!

ilayn commented Oct 7, 2018

Uh oh!

lobpcg commented Oct 7, 2018

Uh oh!

rc commented Oct 7, 2018

Uh oh!

rc commented Oct 7, 2018

Uh oh!

ilayn commented Oct 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rc commented Oct 8, 2018

Uh oh!

lobpcg commented Oct 8, 2018

Uh oh!

rc commented Oct 8, 2018

Uh oh!

lobpcg commented Oct 8, 2018

Uh oh!

rc commented Oct 8, 2018

Uh oh!

lobpcg commented Oct 8, 2018

Uh oh!

rc commented Oct 12, 2018

Uh oh!

ilayn Oct 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rc commented Oct 22, 2018

Uh oh!

tylerjereddy left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pv commented Oct 23, 2018

Uh oh!

lobpcg commented Oct 23, 2018

Uh oh!

ilayn commented Oct 8, 2018 •

edited

Loading

ilayn Oct 12, 2018 •

edited

Loading

lobpcg commented Oct 23, 2018 •

edited

Loading