[MRG] Fix #6031: changed calculation of explained_variance_ratio_, SVD solver by JPFrancoia · Pull Request #6027 · scikit-learn/scikit-learn

JPFrancoia · 2015-12-14T19:17:40Z

solver. See issue #5216. However, this attribute is corrected from the
previous commit which might not return the expected values.

solver. See issue #5216. However, this attribute is corrected from the previous commit which, might not return the expected values.

agramfort · 2015-12-14T19:31:21Z

this is an API change. You cannot do this. The size of the attribute must stay the same. Otherwise we need a deprecation cycle.

solver svd.

JPFrancoia · 2015-12-14T20:09:39Z

Ok. Then https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/discriminant_analysis.py#L337 is not consistent, and should be:

self.explained_variance_ratio_ = np.sort(evals / np.sum(evals))[::-1][:self.n_components]

agramfort · 2015-12-14T20:38:40Z

ok please update the unit tests to show this issue and present the PR as a fix. thx

hlin117 · 2015-12-15T09:55:02Z

@JPFrancoia: Like @agramfort mentioned, you should change the title of the PR to allude to a bug fix. Thanks!

JPFrancoia · 2015-12-15T10:01:09Z

Like that ?

hlin117 · 2015-12-15T10:06:06Z

@JPFrancoia : Excellent =]

Good luck!

JPFrancoia · 2015-12-15T10:20:57Z

I'll make the corrections asap.

LinearDiscriminantAnalysis, SVD solver.

@hlin117

Used the script of @hlin117 in #5216 to generate better test data, make_blobs was not good enough.

JPFrancoia · 2015-12-15T14:22:35Z

@hlin117 , for the regression tests, I used your way to generate the test data, it's efficient.

I think it generates better test data than make_blobs, which typically returns an array like:

[1 a_very_small_value]

With this kind of array, the test would always pass, and wouldn't spot errors. Now it will.

hlin117 · 2015-12-15T14:46:37Z

sklearn/discriminant_analysis.py

Add 4 more spaces here to conform to PEP 8 indentation.
https://www.python.org/dev/peps/pep-0008/#indentation

JPFrancoia · 2015-12-15T15:26:20Z

They should theoretically be of the same length anyways.

They are not:

def test_lda_explained_variance_ratio():
    # Test if the sum of the normalized eigen vectors values equals 1,
    # Also tests whether the explained_variance_ratio_ formed by the
    # eigen solver is the same as the explained_variance_ratio_ formed
    # by the svd solver

    state = np.random.RandomState(0)
    X = state.normal(loc=0, scale=100, size=(40, 20))
    y = state.randint(0, 3, size=(40, 1))

    clf_lda_eigen = LinearDiscriminantAnalysis(solver="eigen")
    clf_lda_eigen.fit(X, y)
    assert_almost_equal(clf_lda_eigen.explained_variance_ratio_.sum(), 1.0, 3)

    print("eigen")
    print(len(clf_lda_eigen.explained_variance_ratio_))
    print(clf_lda_eigen.explained_variance_ratio_)

    clf_lda_svd = LinearDiscriminantAnalysis(solver="svd")
    clf_lda_svd.fit(X, y)
    assert_almost_equal(clf_lda_svd.explained_variance_ratio_.sum(), 1.0, 3)

    print("svd")
    print(len(clf_lda_svd.explained_variance_ratio_))
    print(clf_lda_svd.explained_variance_ratio_)

    # NOTE: clf_lda_eigen.explained_variance_ratio_ is not of n_components
    # length. Make it the same length as clf_lda_svd.explained_variance_ratio_
    # before comparison.
    assert_array_almost_equal(clf_lda_svd.explained_variance_ratio_,
            clf_lda_eigen.explained_variance_ratio_)

eigen
20
[  6.03795532e-01   3.96204468e-01   4.19383626e-16   3.08212423e-16
   1.23275796e-16   9.84308055e-17   6.96952083e-17   4.57224815e-17
   3.73609416e-17   2.12787319e-17   9.54458789e-18  -1.83915898e-17
  -4.11632284e-17  -1.08854065e-16  -1.41296422e-16  -1.58466942e-16
  -1.87047513e-16  -2.18274226e-16  -3.00368948e-16  -4.68795358e-16]
svd
3
[  6.03795532e-01   3.96204468e-01   1.23375797e-32]
Traceback (most recent call last):
  File "/home/djipey/informatique/python/scikit-learn/sklearn/tests/test_discriminant_analysis.py", line 368, in <module>
    test_lda_explained_variance_ratio()
  File "/home/djipey/informatique/python/scikit-learn/sklearn/tests/test_discriminant_analysis.py", line 193, in test_lda_explained_variance_ratio
    clf_lda_eigen.explained_variance_ratio_)
  File "/usr/lib/python3.5/site-packages/numpy/testing/utils.py", line 886, in assert_array_almost_equal
    precision=decimal)
  File "/usr/lib/python3.5/site-packages/numpy/testing/utils.py", line 663, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not almost equal to 6 decimals

JPFrancoia · 2015-12-15T15:39:15Z

Of course they are not. See #6032

number.

JPFrancoia · 2015-12-16T08:30:42Z

Ok, corrections made @hlin117 .

hlin117 · 2015-12-16T22:37:51Z

LGTM. Can you add [MRG] to the title of this PR?

JPFrancoia · 2015-12-17T15:41:28Z

Done. I think we can close #6031 and #5216 when this PR is merged.

hlin117 · 2015-12-18T08:55:24Z

@JPFrancoia : #5216 has already been merged. Did you mean another PR or issue?

JPFrancoia · 2016-04-05T21:30:49Z

Hi, I'm actually not sure what the [MRG] tag means. Will this PR be merged one day ? Do I need to do something more ?

hlin117 · 2016-04-09T03:48:32Z

@JPFrancoia [MRG] means that it is ready to be merged, and it is until review from the scikit-learn team.

agramfort · 2016-04-09T12:05:39Z

LGTM @JPFrancoia please update the what's new to document the bug fix and we're good

results returned with discriminant_analysis.LinearDiscriminantAnalysis's attribute explained_variance_ratio.

JPFrancoia · 2016-04-10T14:58:25Z

Crap. One test failed, and there are some conflicts. What is AppVeyor ? I don't know it.

agramfort · 2016-04-10T15:40:13Z

merged by rebase after a couple of fixes

0476784
74c29e0
262e9e4

thx @JPFrancoia

Attribute explained_variance ratio is now available for LDA with the SVD

abcbe70

solver. See issue #5216. However, this attribute is corrected from the previous commit which, might not return the expected values.

Corrected the dimension of the array explained_variance_ratio_ for LDA,

6e0323d

solver svd.

JPFrancoia mentioned this pull request Dec 15, 2015

Creation of the attribute LDA.explained_variance_ratio_, for the eige… #5216

Merged

JPFrancoia changed the title ~~Attribute explained_variance ratio is now available for LDA with the SVD~~ Fix #6031: changed calculation of explained_variance_ratio_, SVD solver Dec 15, 2015

djipey added 2 commits December 15, 2015 13:19

Cleaned the calculation of explained_variance_ratio_ in

d0513a0

LinearDiscriminantAnalysis, SVD solver.

Regression test for explained_variance_ratio_, LDA, SVD solver.

898c76a

Used the script of @hlin117 in #5216 to generate better test data, make_blobs was not good enough.

hlin117 reviewed Dec 15, 2015
View reviewed changes

Added 4 spaces to conform to PEP8.

3210392

Modified the length of the slice: now uses a variable, not a hardcoded

3ba769f

number.

JPFrancoia mentioned this pull request Dec 17, 2015

Precision errors in LDA.explained_variance_ratio_ #6031

Closed

JPFrancoia changed the title ~~Fix #6031: changed calculation of explained_variance_ratio_, SVD solver~~ [MRG] Fix #6031: changed calculation of explained_variance_ratio_, SVD solver Dec 17, 2015

Modified the What's new file to mention the bug fix about the wrong

9b4447c

results returned with discriminant_analysis.LinearDiscriminantAnalysis's attribute explained_variance_ratio.

agramfort closed this Apr 10, 2016

Uh oh!

Conversation

JPFrancoia commented Dec 14, 2015

Uh oh!

agramfort commented Dec 14, 2015

Uh oh!

JPFrancoia commented Dec 14, 2015

Uh oh!

agramfort commented Dec 14, 2015 via email

Uh oh!

hlin117 commented Dec 15, 2015

Uh oh!

JPFrancoia commented Dec 15, 2015

Uh oh!

hlin117 commented Dec 15, 2015

Uh oh!

JPFrancoia commented Dec 15, 2015

Uh oh!

JPFrancoia commented Dec 15, 2015

Uh oh!

hlin117 Dec 15, 2015

Choose a reason for hiding this comment

Uh oh!

JPFrancoia commented Dec 15, 2015

Uh oh!

JPFrancoia commented Dec 15, 2015

Uh oh!

JPFrancoia commented Dec 16, 2015

Uh oh!

hlin117 commented Dec 16, 2015

Uh oh!

JPFrancoia commented Dec 17, 2015

Uh oh!

hlin117 commented Dec 18, 2015

Uh oh!

JPFrancoia commented Apr 5, 2016

Uh oh!

hlin117 commented Apr 9, 2016

Uh oh!

agramfort commented Apr 9, 2016

Uh oh!

JPFrancoia commented Apr 10, 2016

Uh oh!

agramfort commented Apr 10, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants