[MRG] KernelPCA: fix transform issue when zero eigenvalues are present and not removed (issue 12141) by smarie · Pull Request #12143 · scikit-learn/scikit-learn

smarie · 2018-09-24T11:55:08Z

This fixes #12141

- Added a few comments to clarify `_fit_transform`, `fit_transform` and `transform`. - Uniformized the numpy coding style for `fit` and `fit_transform`.

smarie · 2018-09-24T15:23:16Z

All set, waiting for review.

NicolasHug

Thanks for the fix, this looks like a legit bug.

Here are a few first comments, but we would need some core contributors to jump in. @amueller @jnothman maybe?

sklearn/decomposition/kernel_pca.py

sklearn/decomposition/tests/test_kernel_pca.py

…s where there is no zero division, to avoid numpy warnings.

…o zero division numpy warning.

sklearn/decomposition/tests/test_kernel_pca.py

…ture to perform fine-grain warning assertion

NicolasHug · 2018-10-11T12:10:09Z

Thanks @smarie , LGTM. Let's wait for other reviews now.

smarie · 2018-10-12T09:18:33Z

Thanks @NicolasHug . If you have still a couple time for reviews in the next days, I would really appreciate a review for the companion PRs of this PR:

[MRG before #12069] KernelPCA: raise Errors and Warnings according to eigenvalue decomposition numerical/conditioning issues #12145 to provide kernel warnings when the gram matrix has conditioning issues
[MRG after #12145] Add "Randomized SVD" solver option to KernelPCA for faster partial decompositions, like in PCA #12069 to add the randomized svd solver to accelerate partial decompositions. This was my original PR actually :)

NicolasHug · 2018-10-16T13:50:39Z

Your #12069 (comment)

the randomized method seems to naturally find the "perfect zero eigenvalues" while arpack and dense methods most of the time find a tiny non-zero number instead (e.g. of the order of magnitude of 1e-16 if my memory is correct)

makes me think we should not test for strict equality with zero but instead have an approximate comparison.

smarie · 2018-10-16T14:59:25Z

Actually that's already taken care of - but in #12145 :) where we round to zero the small eigenvalues that are due to bad numerical conditioning (as well as negative eigenvalues because a kernel is supposed to be semi-positive definite so they are probably numerical errors).

See https://github.com/scikit-learn/scikit-learn/pull/12145/files#diff-3b70045c110a30d29de66ed0ea3fb86dR1082 for details

NicolasHug · 2018-10-17T13:31:30Z

OK. I guess this would make sense to do it here?

IMHO the way to go would be to first merge this one, then #12145, and then #12069. This way you can merge master each time to avoid the redundant changes in the diffs (right now #12145 diffs also show the changes from here and that make reviewing trickier)

smarie · 2018-10-17T14:14:28Z

Yes, this is the merge order that should be used since each PR contains and relies on the commits from the previous. I am not sure that it would make sense to backport the zero eigenvalues rounding here: indeed, that is precisely what #12145 is about : controling and harmonizing the eigenvalues found by the various solvers.

NicolasHug · 2018-10-17T14:24:34Z

You're right, if #12145 sets small eigenvalues to 0 then the code here won't have to change.

smarie · 2018-11-15T09:43:06Z

Hi @NicolasHug , @jnothman , @ogrisel : any update on this PR and the subsequent ones #12145 and #12069 ? Experience shows that the more we wait the more likely the PRs will not be mergeable again...

Of course I understand that this is a volunteer-based open-source project with many contributions, but I really believe that the effort we collectively made on these three PR both in terms of idea (@grilling original code in matlab), coding, and code review, is mature now.

…into kPCA_fix_issue_12141

…ithub.com/scikit-learn/scikit-learn/pull/12145/files/2112f5d0278e6342d93e9649d3b0da6021d30f98#r261052208

smarie · 2019-03-02T13:13:28Z

@jnothman I updated what's new following your advice on the other kPCA PR. The documentation does not need to be updated, this is ready to Merge (this PR is the first of a series of 3 : #12143, #12145 and #12069)

NicolasHug

Nitpicks from me, this still LGTM.

I find the comments that you added very helpful.

Hope we can get other reviewers on this!

doc/whats_new/v0.21.rst

sklearn/decomposition/kernel_pca.py

sklearn/decomposition/tests/test_kernel_pca.py

NicolasHug · 2019-03-12T20:44:28Z

Maybe @adrinjalali and @qinhanmin2014 could to review this? Should be quick to review IMO (LGTM already) and it'd be nice to address @smarie great efforts ;)

thomasjpfan

Few nits about the comments. Otherwise LGTM

doc/whats_new/v0.21.rst

sklearn/decomposition/kernel_pca.py

Co-Authored-By: smarie <sylvain.marie@schneider-electric.com>

sklearn/decomposition/tests/test_kernel_pca.py

jnothman

Thanks @smarie

NicolasHug · 2019-03-21T16:26:34Z

This is +2 now, @thomasjpfan can you merge it please?

smarie · 2019-03-22T10:29:43Z

Thanks everyone!

…nd not removed (scikit-learn#12143)

…resent and not removed (scikit-learn#12143)" This reverts commit ba2bb79.

…nd not removed (scikit-learn#12143)

Minor edits for clarity:

d84f675

- Added a few comments to clarify `_fit_transform`, `fit_transform` and `transform`. - Uniformized the numpy coding style for `fit` and `fit_transform`.

smarie mentioned this pull request Sep 24, 2018

[MRG after #12145] Add "Randomized SVD" solver option to KernelPCA for faster partial decompositions, like in PCA #12069

Merged

Fixed scikit-learn#12141.

27b4de5

smarie force-pushed the kPCA_fix_issue_12141 branch from 493e7b5 to 27b4de5 Compare September 24, 2018 13:46

This was referenced Sep 24, 2018

KernalPCA: raise Errors and Warnings according to eigenvalue decomposition numerical/conditioning issues #12140

Closed

[MRG before #12069] KernelPCA: raise Errors and Warnings according to eigenvalue decomposition numerical/conditioning issues #12145

Merged

smarie changed the title ~~KernelPCA: fix transform issue when zero eigenvalues are present and not removed (issue 12141)~~ [MRG] KernelPCA: fix transform issue when zero eigenvalues are present and not removed (issue 12141) Sep 25, 2018

smarie mentioned this pull request Sep 26, 2018

Add "Randomized SVD" solver option to KernelPCA for faster partial decompositions, like in PCA #12068

Closed

Added test corresponding to issue scikit-learn#12141

9d6ee75

NicolasHug reviewed Oct 8, 2018

View reviewed changes

sklearn/decomposition/kernel_pca.py Show resolved Hide resolved

NicolasHug reviewed Oct 9, 2018

View reviewed changes

sklearn/decomposition/kernel_pca.py Outdated Show resolved Hide resolved

NicolasHug reviewed Oct 9, 2018

View reviewed changes

sklearn/decomposition/tests/test_kernel_pca.py Show resolved Hide resolved

smarie added 2 commits October 9, 2018 17:52

Following code review from @NicolasHug : now only scaling eigenvector…

3206a43

…s where there is no zero division, to avoid numpy warnings.

Following code review from @NicolasHug : now checking that there is n…

9fbbbb6

…o zero division numpy warning.

NicolasHug reviewed Oct 9, 2018

View reviewed changes

sklearn/decomposition/tests/test_kernel_pca.py Outdated Show resolved Hide resolved

Following code review from @NicolasHug : now using pytest warning cap…

b09a1e7

…ture to perform fine-grain warning assertion

NicolasHug approved these changes Oct 11, 2018

View reviewed changes

smarie added 2 commits October 12, 2018 09:34

Merge branch 'master' into kPCA_fix_issue_12141

a3900f4

Merge was wrong - removing useless import

7fe9f06

Sylvain MARIE added 2 commits March 1, 2019 18:23

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

ae546ac

…into kPCA_fix_issue_12141

Improved zero array initialization according to code review https://g…

b62b91c

…ithub.com/scikit-learn/scikit-learn/pull/12145/files/2112f5d0278e6342d93e9649d3b0da6021d30f98#r261052208

Updated what's new for this PR

b8782a7

NicolasHug approved these changes Mar 12, 2019

View reviewed changes

NicolasHug reviewed Mar 12, 2019

View reviewed changes

sklearn/decomposition/tests/test_kernel_pca.py Outdated Show resolved Hide resolved

thomasjpfan approved these changes Mar 13, 2019

View reviewed changes

doc/whats_new/v0.21.rst Outdated Show resolved Hide resolved

sklearn/decomposition/kernel_pca.py Show resolved Hide resolved

sklearn/decomposition/kernel_pca.py Show resolved Hide resolved

thomasjpfan and others added 8 commits March 14, 2019 18:33

Update doc/whats_new/v0.21.rst

44d27fb

Co-Authored-By: smarie <sylvain.marie@schneider-electric.com>

grammar fix

6684f8d

Referenced the issue

b47d53a

Grammar fix

b8a8f89

Issue number replaced with PR number

5f5604d

renamed nz nonzeros following review

7f40c3e

Fixed 80 characters max issue + improved comment as per review

d085cb5

Fixed comment as per review

7a45ea2

jnothman reviewed Mar 18, 2019

View reviewed changes

sklearn/decomposition/tests/test_kernel_pca.py Outdated Show resolved Hide resolved

sklearn/decomposition/tests/test_kernel_pca.py Show resolved Hide resolved

Sylvain MARIE added 2 commits March 19, 2019 11:49

Fixed comment as per review

a6c6ee1

Fixed test as per review

4f647ab

jnothman approved these changes Mar 19, 2019

View reviewed changes

jnothman reviewed Mar 19, 2019

View reviewed changes

jnothman added this to the 0.21 milestone Mar 20, 2019

thomasjpfan merged commit 2f5bb34 into scikit-learn:master Mar 21, 2019

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

FIX KernelPCA fix transform issue when zero eigenvalues are present a…

ba2bb79

…nd not removed (scikit-learn#12143)

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "FIX KernelPCA fix transform issue when zero eigenvalues are p…

7dc5a30

…resent and not removed (scikit-learn#12143)" This reverts commit ba2bb79.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "FIX KernelPCA fix transform issue when zero eigenvalues are p…

504e7ed

…resent and not removed (scikit-learn#12143)" This reverts commit ba2bb79.

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

FIX KernelPCA fix transform issue when zero eigenvalues are present a…

5bcff5c

…nd not removed (scikit-learn#12143)

Uh oh!

Conversation

smarie commented Sep 24, 2018

Uh oh!

smarie commented Sep 24, 2018

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NicolasHug commented Oct 11, 2018

Uh oh!

smarie commented Oct 12, 2018

Uh oh!

NicolasHug commented Oct 16, 2018

Uh oh!

smarie commented Oct 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NicolasHug commented Oct 17, 2018

Uh oh!

smarie commented Oct 17, 2018

Uh oh!

NicolasHug commented Oct 17, 2018

Uh oh!

smarie commented Nov 15, 2018

Uh oh!

smarie commented Mar 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NicolasHug left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NicolasHug commented Mar 12, 2019

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

NicolasHug commented Mar 21, 2019

Uh oh!

smarie commented Mar 22, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

smarie commented Oct 16, 2018 •

edited

Loading

smarie commented Mar 2, 2019 •

edited

Loading

NicolasHug left a comment •

edited

Loading