[WIP] Make spectral embedding unconditionally deterministic and related fixes. by raghavrv · Pull Request #4299 · scikit-learn/scikit-learn

raghavrv · 2015-02-26T19:58:24Z

Fixes #4236

Started off as addressing #4249 (comment)

Fix deterministic_vector_sign_flip to make sure the sign of the first element of every eigen vector is positive always.
Fully fix Make spectral_embedding deterministic #4236
Make SpectralEmbedding unconditionally deterministic even when the graph is not fully connected. ( ~~Use "lobpcg" solver in tests instead~~ Doesn't fix it fully )
Include SpectralEmbedding in the check_pipeline_consistency test.

~~Make LaplacianEmbedding accept lists and add tests for the same.~~ Another PR

~~Use a data that does not lead to the "graph not fully connected warning" only to avoid cluttering the output.~~

Use catch warnings instead(?).

ping @amueller

amueller · 2015-02-26T21:22:23Z

I don't think we should change the affinity here. Results should be deterministic no matter what.

amueller · 2015-02-26T21:23:10Z

I don't like the global random state. That means if you only run particular tests, the result will be different than if you run all tests. That can make debugging tricky.

raghavrv · 2015-02-26T21:36:57Z

That means if you only run particular tests, the result will be different than if you run all tests

Ah! I didn't think of that... ;) will delete that commit...!

ogrisel · 2015-02-27T10:40:07Z

Ah! I didn't think of that... ;) will delete that commit...!

Don't delete it, just git amend it to move the definition of random_state from a global to a local variable at the top of the test function.

raghavrv · 2015-03-06T11:01:15Z

I actually unnecessarily refactored out the previously correctly placed rng initializations assuming it would be neat... So deleting that commit should revert to the previous state which was correct :)

raghavrv · 2015-03-06T13:30:16Z

Should affinity_matrix too be tested?

If so, assert_array_almost_equal as well as assert_almost_equal both do not test equality of sparse matrices... (affinity_matrix is sparse if affinity="nearest_neighbors" is used).

To fix them both should I send a pull request to numpy and add a fix to utils.testing?

amueller · 2015-03-06T22:25:07Z

Numpy does not support sparse matrices.
If the data is small, can you just densify?

raghavrv · 2015-03-06T22:47:34Z

done :)

amueller · 2015-03-06T22:57:41Z

But this still does not fix the issue, right? Why is it not deterministic when the graph is not fully connected?

raghavrv · 2015-03-07T17:23:49Z

I'll investigate further and let you know!

landscape-bot · 2015-03-11T17:56:55Z

Code quality remained the same when pulling f484f5f on ragv:spectral_embedding_deterministic_2 into ff942f9 on scikit-learn:master.

ogrisel · 2015-03-12T09:26:42Z

sklearn/utils/estimator_checks.py

Why a special case in this common check? Once the remaining sign switch is fixed (the constant abs values case), SpectralEmbedding should be fully deterministic and there would be no need for a specific dataset in the checks, right?

Currently 2 things cause the o/p to be non-deterministic...

Major one is when the data produces an adjacency graph which is not fully connected. (But a warning is raised stating spectral embedding may not behave correctly since the adj. graph is not fully connected, so I am not quite sure if non-deterministic output is to be expected here)

Minor one is when there are multiple max abs values where the sign has to be arbitrarily given to the first index etc... (I'll fix this and push the commit soon...)

So yes if the 1st one is to be fixed then we should not special case the input... :)

Well but the first should also be deterministic! There is no reason that a non-connected graph gives non-deterministic results.

Okay! So I've (or I think I've) narrowed down the issue to a nearly singular graph laplacian when the graph is not fully connected...

arpack's eigen solver seems to produce the non-deterministic eigen vectors when the laplacian matrix has a very low determinant... (order e-50) ( ~~only "lobpcg" handles this perfectly producing deterministic output everytime... "amg" is not quite stable with nearly sigular matrices ( raises LinAlgError randomly )~~ Wrong observation both "lobpcg" and "amg" also tend to throw errors with nearly / fully singular matrices.)

When this particular S_X, S_y is used the determinant of the laplacian is of the order e-20 ~~which seems to be handled perfectly by "arpack", "lobpcg" as well as "amg" eigen_solvers.~~

Could you kindly confirm if my observations are true?

We could fix this by reverting to "lobpcg" when the determinant of laplacian is very low (but involves an additional determinant computation step).

(Also note that we currently fall back to "lobpcg" solver when arpack raises ~~LigAlgError~~RuntimeError... which is when the laplacian is perfectly singular)

NOTE: I made a wrong observation that lobpcg handles singular matrices perfectly. Both "lobpcg" and "amg" also tend to throw errors with nearly / fully singular matrices ( I hadn't tested it for different random states back then )

What is the change in the result from the solver? Are the eigenvalues too close? And what is the behavior then?

arpack can take a v0 to seed the random vector. Unfortunately, passing a gaussian distributed v0 is not correct. The correct way to do it would be to replicate:

https://github.com/scipy/scipy/blob/master/scipy/sparse/linalg/eigen/arpack/ARPACK/SRC/dgetv0.f

That looks complicated though. I am no fortran developer myself.

thanks for the comments... This is just a minor confirmation that the arpacks eigsh does not raise RuntimeError but instead just gives out random values, which is the root cause of the the current issue at hand...

>>> N = np.array([[0,]*5]*5, dtype=float) >>> print eigsh(N, k=1) (array([ 0.]), array([[-0.32529456], [ 0.29618396], [ 0.794161 ], [-0.04909478], [-0.41636105]])) >>> print eigsh(N, k=1) (array([ 0.]), array([[-0.34134676], [-0.71507399], [ 0.00610712], [-0.58006083], [ 0.18879544]]))

Thanks for confirming @ragv. Let's try to make the common tests use the deterministic dense solver then.

Note: np.array([[0,]*5]*5, dtype=float) is better written np.zeros(shape=(5, 5), dtype=np.float64).

Note: np.array([[0,]*5]*5, dtype=float) is better written np.zeros(shape=(5, 5), dtype=np.float64) .

Ah thanks!! :)

Let's try to make the common tests use the deterministic dense solver then.

Do you mean to say that in the set_fast_parameters we could set eigen_solver to "lobpcg"?

lobpcg or dense depending on the models. I don't think the notation is consistent.

raghavrv · 2015-03-15T11:55:57Z

@ogrisel Even with "lobpcg", this works only for particular random_states, which is not desirable... Any suggestions? ( Sorry I was wrong in generalizing that "lobpcg" works perfectly for nearly singular matrices, from just a few observations )

raghavrv · 2015-03-15T12:41:58Z

Also copying a chat from gitter -

Do you feel _deterministic_vector_sign_flip should be made public, since svd_flip is public?

@ogrisel - Nope... svd_flip should have probably been private..

Follow up question. Do you want me to change svd_flip --> _svd_flip?

amueller · 2015-03-16T20:01:54Z

Are the eigenvalues exactly equal to zero? Then we should remove the corresponding eigenvalues, right? That should get rid of the non-determinism.

amueller · 2015-03-16T20:02:44Z

For the rename: leave it as is currently. our policy on what is public and what is private in utils is a bit... fuzzy.

vene · 2015-05-14T08:36:30Z

sklearn/utils/tests/test_extmath.py

Why the double underscore?

raghavrv · 2015-10-05T14:57:02Z

@amueller @ogrisel Any suggestions on how we can proceed on this?

ogrisel · 2018-07-05T08:28:08Z

#4236 was fixed by 6cb51b2.

raghavrv force-pushed the spectral_embedding_deterministic_2 branch from 1d79ac5 to 095401a Compare February 26, 2015 20:11

raghavrv force-pushed the spectral_embedding_deterministic_2 branch 2 times, most recently from 386a31a to 071fcaf Compare March 6, 2015 12:14

raghavrv force-pushed the spectral_embedding_deterministic_2 branch 4 times, most recently from dc98016 to 9c104a8 Compare March 6, 2015 17:10

raghavrv changed the title ~~Spectral embedding deterministic 2~~ [MRG] Spectral embedding deterministic 2 Mar 6, 2015

raghavrv changed the title ~~[MRG] Spectral embedding deterministic 2~~ [MRG] Make spectral embedding unconditionally deterministic and related fixes. Mar 8, 2015

ogrisel reviewed Mar 12, 2015
View reviewed changes

raghavrv force-pushed the spectral_embedding_deterministic_2 branch from f484f5f to b11b002 Compare March 14, 2015 12:13

vene reviewed May 14, 2015
View reviewed changes

sklearn/utils/tests/test_extmath.py Outdated

Copy link
Copy Markdown

Member

vene May 14, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the double underscore?

raghavrv mentioned this pull request May 22, 2015

[WIP] Adding tests for estimators implementing partial_fit and a few other related fixes / enhancements #3907

Closed

6 tasks

raghavrv closed this May 22, 2015

raghavrv force-pushed the spectral_embedding_deterministic_2 branch from b11b002 to c4df19d Compare May 22, 2015 22:41

raghavrv reopened this May 22, 2015

raghavrv changed the title ~~[MRG] Make spectral embedding unconditionally deterministic and related fixes.~~ [WIP] Make spectral embedding unconditionally deterministic and related fixes. Oct 12, 2015

raghavrv mentioned this pull request Oct 20, 2015

Make spectral_embedding deterministic #4236

Closed

Random change

b8b6bb8

raghavrv force-pushed the spectral_embedding_deterministic_2 branch from ad684c6 to b8b6bb8 Compare November 19, 2015 17:15

raghavrv changed the title ~~[WIP] Make spectral embedding unconditionally deterministic and related fixes.~~ [WIP] Make LaplacianEigenmap unconditionally deterministic and related fixes. Nov 21, 2015

raghavrv changed the title ~~[WIP] Make LaplacianEigenmap unconditionally deterministic and related fixes.~~ [WIP] Make spectral embedding unconditionally deterministic and related fixes. Jan 26, 2017

jnothman added Stalled help wanted labels Feb 6, 2018

ogrisel closed this Jul 5, 2018

Uh oh!

Conversation

raghavrv commented Feb 26, 2015

Uh oh!

amueller commented Feb 26, 2015

Uh oh!

amueller commented Feb 26, 2015

Uh oh!

raghavrv commented Feb 26, 2015

Uh oh!

ogrisel commented Feb 27, 2015

Uh oh!

raghavrv commented Mar 6, 2015

Uh oh!

raghavrv commented Mar 6, 2015

Uh oh!

amueller commented Mar 6, 2015

Uh oh!

raghavrv commented Mar 6, 2015

Uh oh!

amueller commented Mar 6, 2015

Uh oh!

raghavrv commented Mar 7, 2015

Uh oh!

landscape-bot commented Mar 11, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

raghavrv commented Mar 15, 2015

Uh oh!

raghavrv commented Mar 15, 2015

Uh oh!

amueller commented Mar 16, 2015

Uh oh!

amueller commented Mar 16, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

raghavrv commented Oct 5, 2015

Uh oh!

ogrisel commented Jul 5, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants