[MRG] MNT Fixes for PCA with n_components='mle' by NicolasHug · Pull Request #16841 · scikit-learn/scikit-learn

NicolasHug · 2020-04-04T20:48:45Z

Fixes #16730
Closes #16546

infer_dimension() now returns a value in [1, n_features - 1]. rank = 0 isn't possible anymore (it's a bug fix, as this would transform into an empty array).
rank as passed to _assess_dimension() is actually the rank, not an index as previously. This fixes the infamous off-by-one error. Note also that the method isn't defined if we pass it rank == n_features (division by zero)
a few fixes in the formula

NicolasHug · 2020-04-04T21:11:37Z

ping @adrinjalali @jnothman @agramfort @lschwetlick

…x_pca_mle

lschwetlick · 2020-04-07T15:52:27Z

This looks good to me, thanks for fixing!

NicolasHug · 2020-04-07T16:44:07Z

Thanks for checking @lschwetlick . If you have time / feel like it, it'd be great if you could check with your notebook if the results make sense now? (but no worries otherwise)

lschwetlick · 2020-04-07T17:28:28Z

Well... not sure if this is maybe quite edge case-y but

b = np.ones((9, 6))
print("rank=", np.linalg.matrix_rank(b))
u2, s2, vh2 = np.linalg.svd(b, full_matrices=True)
ll = np.empty_like(s2)
ll[0] = -np.inf
for r in range(1,s2.shape[0]):
    ll[r] = (_assess_dimension(np.asarray(s2), r, 9))
print(ll)

-> [-inf nan -inf -inf -inf -inf]

but then funnily enough it gives the right number of dimensions (1) because apparently argmax treats nan>-inf. Its giving nan because if all subsequent values in the spectrum are 0 then in line 76 v is also 0...

lschwetlick · 2020-04-07T17:39:17Z

I'm also still slightly confused to be honest: we dont test full rank because the method isn't defined for testing full rank. But what if the number of dimensions is full rank?

…x_pca_mle

NicolasHug · 2020-04-07T18:03:25Z

Thanks @lschwetlick , I fixed that and added a test (though indeed it's an unlikely case I think)

But what if the number of dimensions is full rank?

I guess users use mle when they want some dimensionality reduction, but just don't know by how much. So that's fine not to consider rank==n_features (we can't anyway). There's always the solution to directly provide n_components=n_features.

lschwetlick · 2020-04-07T18:05:56Z

Alright cool, other than that my notebook and I are happy :)

agramfort · 2020-04-07T20:12:54Z

thanks a lot @NicolasHug and @lschwetlick for the team work !

* Fixed off by one in MLE and better handling of small eigenvalues * light update tests * pep8 * Added test + threhsold on small log

Fixed off by one in MLE and better handling of small eigenvalues

728badb

github-actions bot added the module:decomposition label Apr 4, 2020

NicolasHug added this to the 0.23 milestone Apr 4, 2020

NicolasHug added the Blocker label Apr 4, 2020

NicolasHug added 3 commits April 5, 2020 08:10

Merge branch 'master' of github.com:scikit-learn/scikit-learn into fi…

9ec987f

…x_pca_mle

light update tests

b4aeb99

pep8

cea7967

NicolasHug added 2 commits April 7, 2020 13:41

Merge branch 'master' of github.com:scikit-learn/scikit-learn into fi…

0eb9926

…x_pca_mle

Added test + threhsold on small log

48b94e4

agramfort merged commit a655de5 into scikit-learn:master Apr 7, 2020

oleksandr-pavlyk mentioned this pull request Apr 27, 2020

adjusted PCA code to changes in sklearn 0.23 uxlfoundation/scikit-learn-intelex#208

Merged

rth mentioned this pull request Jul 30, 2020

Test test_assess_dimesion_rank_one fails on multiple architectures #18031

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MRG] MNT Fixes for PCA with n_components='mle'#16841

[MRG] MNT Fixes for PCA with n_components='mle'#16841
agramfort merged 6 commits intoscikit-learn:masterfrom
NicolasHug:fix_pca_mle

NicolasHug commented Apr 4, 2020 •

edited

Loading

Uh oh!

NicolasHug commented Apr 4, 2020

Uh oh!

lschwetlick commented Apr 7, 2020

Uh oh!

NicolasHug commented Apr 7, 2020

Uh oh!

lschwetlick commented Apr 7, 2020

Uh oh!

lschwetlick commented Apr 7, 2020

Uh oh!

NicolasHug commented Apr 7, 2020 •

edited

Loading

Uh oh!

lschwetlick commented Apr 7, 2020

Uh oh!

agramfort commented Apr 7, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

NicolasHug commented Apr 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NicolasHug commented Apr 4, 2020

Uh oh!

lschwetlick commented Apr 7, 2020

Uh oh!

NicolasHug commented Apr 7, 2020

Uh oh!

lschwetlick commented Apr 7, 2020

Uh oh!

lschwetlick commented Apr 7, 2020

Uh oh!

NicolasHug commented Apr 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lschwetlick commented Apr 7, 2020

Uh oh!

agramfort commented Apr 7, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

NicolasHug commented Apr 4, 2020 •

edited

Loading

NicolasHug commented Apr 7, 2020 •

edited

Loading