[MRG] MNT Fixes for PCA with n_components='mle'#16841
[MRG] MNT Fixes for PCA with n_components='mle'#16841agramfort merged 6 commits intoscikit-learn:masterfrom
Conversation
|
This looks good to me, thanks for fixing! |
|
Thanks for checking @lschwetlick . If you have time / feel like it, it'd be great if you could check with your notebook if the results make sense now? (but no worries otherwise) |
|
Well... not sure if this is maybe quite edge case-y but -> [-inf nan -inf -inf -inf -inf] but then funnily enough it gives the right number of dimensions (1) because apparently argmax treats nan>-inf. Its giving nan because if all subsequent values in the spectrum are 0 then in line 76 |
|
I'm also still slightly confused to be honest: we dont test full rank because the method isn't defined for testing full rank. But what if the number of dimensions is full rank? |
|
Thanks @lschwetlick , I fixed that and added a test (though indeed it's an unlikely case I think)
I guess users use mle when they want some dimensionality reduction, but just don't know by how much. So that's fine not to consider rank==n_features (we can't anyway). There's always the solution to directly provide n_components=n_features. |
|
Alright cool, other than that my notebook and I are happy :) |
|
thanks a lot @NicolasHug and @lschwetlick for the team work ! |
* Fixed off by one in MLE and better handling of small eigenvalues * light update tests * pep8 * Added test + threhsold on small log
Fixes #16730
Closes #16546
infer_dimension()now returns a value in[1, n_features - 1]. rank = 0 isn't possible anymore (it's a bug fix, as this would transform into an empty array).rankas passed to_assess_dimension()is actually the rank, not an index as previously. This fixes the infamous off-by-one error. Note also that the method isn't defined if we pass it rank == n_features (division by zero)