Skip to content

[WIP] Make spectral embedding unconditionally deterministic and related fixes.#4299

Closed
raghavrv wants to merge 1 commit intoscikit-learn:masterfrom
raghavrv:spectral_embedding_deterministic_2
Closed

[WIP] Make spectral embedding unconditionally deterministic and related fixes.#4299
raghavrv wants to merge 1 commit intoscikit-learn:masterfrom
raghavrv:spectral_embedding_deterministic_2

Conversation

@raghavrv
Copy link
Copy Markdown
Member

Fixes #4236

Started off as addressing #4249 (comment)

  • Fix deterministic_vector_sign_flip to make sure the sign of the first element of every eigen vector is positive always.
  • Fully fix Make spectral_embedding deterministic #4236
  • Make SpectralEmbedding unconditionally deterministic even when the graph is not fully connected. ( Use "lobpcg" solver in tests instead Doesn't fix it fully )
  • Include SpectralEmbedding in the check_pipeline_consistency test.

Make LaplacianEmbedding accept lists and add tests for the same. Another PR

Use a data that does not lead to the "graph not fully connected warning" only to avoid cluttering the output.

  • Use catch warnings instead(?).

ping @amueller

@raghavrv raghavrv force-pushed the spectral_embedding_deterministic_2 branch from 1d79ac5 to 095401a Compare February 26, 2015 20:11
@amueller
Copy link
Copy Markdown
Member

I don't think we should change the affinity here. Results should be deterministic no matter what.

@amueller
Copy link
Copy Markdown
Member

I don't like the global random state. That means if you only run particular tests, the result will be different than if you run all tests. That can make debugging tricky.

@raghavrv
Copy link
Copy Markdown
Member Author

That means if you only run particular tests, the result will be different than if you run all tests

Ah! I didn't think of that... ;) will delete that commit...!

@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Feb 27, 2015

Ah! I didn't think of that... ;) will delete that commit...!

Don't delete it, just git amend it to move the definition of random_state from a global to a local variable at the top of the test function.

@raghavrv
Copy link
Copy Markdown
Member Author

raghavrv commented Mar 6, 2015

I actually unnecessarily refactored out the previously correctly placed rng initializations assuming it would be neat... So deleting that commit should revert to the previous state which was correct :)

@raghavrv raghavrv force-pushed the spectral_embedding_deterministic_2 branch 2 times, most recently from 386a31a to 071fcaf Compare March 6, 2015 12:14
@raghavrv
Copy link
Copy Markdown
Member Author

raghavrv commented Mar 6, 2015

Should affinity_matrix too be tested?

If so, assert_array_almost_equal as well as assert_almost_equal both do not test equality of sparse matrices... (affinity_matrix is sparse if affinity="nearest_neighbors" is used).

To fix them both should I send a pull request to numpy and add a fix to utils.testing?

@raghavrv raghavrv force-pushed the spectral_embedding_deterministic_2 branch 4 times, most recently from dc98016 to 9c104a8 Compare March 6, 2015 17:10
@raghavrv raghavrv changed the title Spectral embedding deterministic 2 [MRG] Spectral embedding deterministic 2 Mar 6, 2015
@amueller
Copy link
Copy Markdown
Member

amueller commented Mar 6, 2015

Numpy does not support sparse matrices.
If the data is small, can you just densify?

@raghavrv
Copy link
Copy Markdown
Member Author

raghavrv commented Mar 6, 2015

done :)

@amueller
Copy link
Copy Markdown
Member

amueller commented Mar 6, 2015

But this still does not fix the issue, right? Why is it not deterministic when the graph is not fully connected?

@raghavrv
Copy link
Copy Markdown
Member Author

raghavrv commented Mar 7, 2015

I'll investigate further and let you know!

@raghavrv raghavrv changed the title [MRG] Spectral embedding deterministic 2 [MRG] Make spectral embedding unconditionally deterministic and related fixes. Mar 8, 2015
@landscape-bot
Copy link
Copy Markdown

Code Health
Code quality remained the same when pulling f484f5f on ragv:spectral_embedding_deterministic_2 into ff942f9 on scikit-learn:master.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a special case in this common check? Once the remaining sign switch is fixed (the constant abs values case), SpectralEmbedding should be fully deterministic and there would be no need for a specific dataset in the checks, right?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently 2 things cause the o/p to be non-deterministic...

Major one is when the data produces an adjacency graph which is not fully connected. (But a warning is raised stating spectral embedding may not behave correctly since the adj. graph is not fully connected, so I am not quite sure if non-deterministic output is to be expected here)

Minor one is when there are multiple max abs values where the sign has to be arbitrarily given to the first index etc... (I'll fix this and push the commit soon...)

So yes if the 1st one is to be fixed then we should not special case the input... :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well but the first should also be deterministic! There is no reason that a non-connected graph gives non-deterministic results.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay! So I've (or I think I've) narrowed down the issue to a nearly singular graph laplacian when the graph is not fully connected...

  • arpack's eigen solver seems to produce the non-deterministic eigen vectors when the laplacian matrix has a very low determinant... (order e-50) ( only "lobpcg" handles this perfectly producing deterministic output everytime... "amg" is not quite stable with nearly sigular matrices ( raises LinAlgError randomly ) Wrong observation both "lobpcg" and "amg" also tend to throw errors with nearly / fully singular matrices.)
  • When this particular S_X, S_y is used the determinant of the laplacian is of the order e-20 which seems to be handled perfectly by "arpack", "lobpcg" as well as "amg" eigen_solvers.

Could you kindly confirm if my observations are true?

We could fix this by reverting to "lobpcg" when the determinant of laplacian is very low (but involves an additional determinant computation step).

(Also note that we currently fall back to "lobpcg" solver when arpack raises LigAlgErrorRuntimeError... which is when the laplacian is perfectly singular)

NOTE: I made a wrong observation that lobpcg handles singular matrices perfectly. Both "lobpcg" and "amg" also tend to throw errors with nearly / fully singular matrices ( I hadn't tested it for different random states back then )

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the change in the result from the solver? Are the eigenvalues too close? And what is the behavior then?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arpack can take a v0 to seed the random vector. Unfortunately, passing a gaussian distributed v0 is not correct. The correct way to do it would be to replicate:

https://github.com/scipy/scipy/blob/master/scipy/sparse/linalg/eigen/arpack/ARPACK/SRC/dgetv0.f

That looks complicated though. I am no fortran developer myself.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the comments... This is just a minor confirmation that the arpacks eigsh does not raise RuntimeError but instead just gives out random values, which is the root cause of the the current issue at hand...

>>> N = np.array([[0,]*5]*5, dtype=float)
>>> print eigsh(N, k=1)
(array([ 0.]), array([[-0.32529456],
       [ 0.29618396],
       [ 0.794161  ],
       [-0.04909478],
       [-0.41636105]]))
>>> print eigsh(N, k=1)
(array([ 0.]), array([[-0.34134676],
       [-0.71507399],
       [ 0.00610712],
       [-0.58006083],
       [ 0.18879544]]))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirming @ragv. Let's try to make the common tests use the deterministic dense solver then.

Note: np.array([[0,]*5]*5, dtype=float) is better written np.zeros(shape=(5, 5), dtype=np.float64).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: np.array([[0,]*5]*5, dtype=float) is better written np.zeros(shape=(5, 5), dtype=np.float64) .

Ah thanks!! :)

Let's try to make the common tests use the deterministic dense solver then.

Do you mean to say that in the set_fast_parameters we could set eigen_solver to "lobpcg"?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lobpcg or dense depending on the models. I don't think the notation is consistent.

@raghavrv raghavrv force-pushed the spectral_embedding_deterministic_2 branch from f484f5f to b11b002 Compare March 14, 2015 12:13
@raghavrv
Copy link
Copy Markdown
Member Author

@ogrisel Even with "lobpcg", this works only for particular random_states, which is not desirable... Any suggestions? ( Sorry I was wrong in generalizing that "lobpcg" works perfectly for nearly singular matrices, from just a few observations )

@raghavrv
Copy link
Copy Markdown
Member Author

Also copying a chat from gitter -

Do you feel _deterministic_vector_sign_flip should be made public, since svd_flip is public?

@ogrisel - Nope... svd_flip should have probably been private..

Follow up question. Do you want me to change svd_flip --> _svd_flip?

@amueller
Copy link
Copy Markdown
Member

Are the eigenvalues exactly equal to zero? Then we should remove the corresponding eigenvalues, right? That should get rid of the non-determinism.

@amueller
Copy link
Copy Markdown
Member

For the rename: leave it as is currently. our policy on what is public and what is private in utils is a bit... fuzzy.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the double underscore?

@raghavrv raghavrv force-pushed the spectral_embedding_deterministic_2 branch from b11b002 to c4df19d Compare May 22, 2015 22:41
@raghavrv raghavrv reopened this May 22, 2015
@raghavrv
Copy link
Copy Markdown
Member Author

raghavrv commented Oct 5, 2015

@amueller @ogrisel Any suggestions on how we can proceed on this?

@raghavrv raghavrv changed the title [MRG] Make spectral embedding unconditionally deterministic and related fixes. [WIP] Make spectral embedding unconditionally deterministic and related fixes. Oct 12, 2015
@raghavrv raghavrv force-pushed the spectral_embedding_deterministic_2 branch from ad684c6 to b8b6bb8 Compare November 19, 2015 17:15
@raghavrv raghavrv changed the title [WIP] Make spectral embedding unconditionally deterministic and related fixes. [WIP] Make LaplacianEigenmap unconditionally deterministic and related fixes. Nov 21, 2015
@raghavrv raghavrv changed the title [WIP] Make LaplacianEigenmap unconditionally deterministic and related fixes. [WIP] Make spectral embedding unconditionally deterministic and related fixes. Jan 26, 2017
@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Jul 5, 2018

#4236 was fixed by 6cb51b2.

@ogrisel ogrisel closed this Jul 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make spectral_embedding deterministic

6 participants