Skip to content

[MRG] Fix #6031: changed calculation of explained_variance_ratio_, SVD solver#6027

Closed
JPFrancoia wants to merge 7 commits intoscikit-learn:masterfrom
JPFrancoia:master
Closed

[MRG] Fix #6031: changed calculation of explained_variance_ratio_, SVD solver#6027
JPFrancoia wants to merge 7 commits intoscikit-learn:masterfrom
JPFrancoia:master

Conversation

@JPFrancoia
Copy link
Copy Markdown
Contributor

solver. See issue #5216. However, this attribute is corrected from the
previous commit which might not return the expected values.

solver. See issue #5216. However, this attribute is corrected from the
previous commit which, might not return the expected values.
@agramfort
Copy link
Copy Markdown
Member

this is an API change. You cannot do this. The size of the attribute must stay the same. Otherwise we need a deprecation cycle.

@JPFrancoia
Copy link
Copy Markdown
Contributor Author

Ok. Then https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/discriminant_analysis.py#L337 is not consistent, and should be:

self.explained_variance_ratio_ = np.sort(evals / np.sum(evals))[::-1][:self.n_components]

@agramfort
Copy link
Copy Markdown
Member

agramfort commented Dec 14, 2015 via email

@hlin117
Copy link
Copy Markdown
Contributor

hlin117 commented Dec 15, 2015

@JPFrancoia: Like @agramfort mentioned, you should change the title of the PR to allude to a bug fix. Thanks!

@JPFrancoia JPFrancoia changed the title Attribute explained_variance ratio is now available for LDA with the SVD Fix #6031: changed calculation of explained_variance_ratio_, SVD solver Dec 15, 2015
@JPFrancoia
Copy link
Copy Markdown
Contributor Author

Like that ?

@hlin117
Copy link
Copy Markdown
Contributor

hlin117 commented Dec 15, 2015

@JPFrancoia : Excellent =]

Good luck!

@JPFrancoia
Copy link
Copy Markdown
Contributor Author

I'll make the corrections asap.

djipey added 2 commits December 15, 2015 13:19
LinearDiscriminantAnalysis, SVD solver.
Used the script of @hlin117 in #5216 to generate better test data,
make_blobs was not good enough.
@JPFrancoia
Copy link
Copy Markdown
Contributor Author

@hlin117 , for the regression tests, I used your way to generate the test data, it's efficient.

I think it generates better test data than make_blobs, which typically returns an array like:

[1 a_very_small_value]

With this kind of array, the test would always pass, and wouldn't spot errors. Now it will.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add 4 more spaces here to conform to PEP 8 indentation.
https://www.python.org/dev/peps/pep-0008/#indentation

@JPFrancoia
Copy link
Copy Markdown
Contributor Author

They should theoretically be of the same length anyways.

They are not:

def test_lda_explained_variance_ratio():
    # Test if the sum of the normalized eigen vectors values equals 1,
    # Also tests whether the explained_variance_ratio_ formed by the
    # eigen solver is the same as the explained_variance_ratio_ formed
    # by the svd solver

    state = np.random.RandomState(0)
    X = state.normal(loc=0, scale=100, size=(40, 20))
    y = state.randint(0, 3, size=(40, 1))

    clf_lda_eigen = LinearDiscriminantAnalysis(solver="eigen")
    clf_lda_eigen.fit(X, y)
    assert_almost_equal(clf_lda_eigen.explained_variance_ratio_.sum(), 1.0, 3)

    print("eigen")
    print(len(clf_lda_eigen.explained_variance_ratio_))
    print(clf_lda_eigen.explained_variance_ratio_)

    clf_lda_svd = LinearDiscriminantAnalysis(solver="svd")
    clf_lda_svd.fit(X, y)
    assert_almost_equal(clf_lda_svd.explained_variance_ratio_.sum(), 1.0, 3)

    print("svd")
    print(len(clf_lda_svd.explained_variance_ratio_))
    print(clf_lda_svd.explained_variance_ratio_)

    # NOTE: clf_lda_eigen.explained_variance_ratio_ is not of n_components
    # length. Make it the same length as clf_lda_svd.explained_variance_ratio_
    # before comparison.
    assert_array_almost_equal(clf_lda_svd.explained_variance_ratio_,
            clf_lda_eigen.explained_variance_ratio_)
eigen
20
[  6.03795532e-01   3.96204468e-01   4.19383626e-16   3.08212423e-16
   1.23275796e-16   9.84308055e-17   6.96952083e-17   4.57224815e-17
   3.73609416e-17   2.12787319e-17   9.54458789e-18  -1.83915898e-17
  -4.11632284e-17  -1.08854065e-16  -1.41296422e-16  -1.58466942e-16
  -1.87047513e-16  -2.18274226e-16  -3.00368948e-16  -4.68795358e-16]
svd
3
[  6.03795532e-01   3.96204468e-01   1.23375797e-32]
Traceback (most recent call last):
  File "/home/djipey/informatique/python/scikit-learn/sklearn/tests/test_discriminant_analysis.py", line 368, in <module>
    test_lda_explained_variance_ratio()
  File "/home/djipey/informatique/python/scikit-learn/sklearn/tests/test_discriminant_analysis.py", line 193, in test_lda_explained_variance_ratio
    clf_lda_eigen.explained_variance_ratio_)
  File "/usr/lib/python3.5/site-packages/numpy/testing/utils.py", line 886, in assert_array_almost_equal
    precision=decimal)
  File "/usr/lib/python3.5/site-packages/numpy/testing/utils.py", line 663, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not almost equal to 6 decimals

@JPFrancoia
Copy link
Copy Markdown
Contributor Author

Of course they are not. See #6032

@JPFrancoia
Copy link
Copy Markdown
Contributor Author

Ok, corrections made @hlin117 .

@hlin117
Copy link
Copy Markdown
Contributor

hlin117 commented Dec 16, 2015

LGTM. Can you add [MRG] to the title of this PR?

@JPFrancoia JPFrancoia changed the title Fix #6031: changed calculation of explained_variance_ratio_, SVD solver [MRG] Fix #6031: changed calculation of explained_variance_ratio_, SVD solver Dec 17, 2015
@JPFrancoia
Copy link
Copy Markdown
Contributor Author

Done. I think we can close #6031 and #5216 when this PR is merged.

@hlin117
Copy link
Copy Markdown
Contributor

hlin117 commented Dec 18, 2015

@JPFrancoia : #5216 has already been merged. Did you mean another PR or issue?

@JPFrancoia
Copy link
Copy Markdown
Contributor Author

Hi, I'm actually not sure what the [MRG] tag means. Will this PR be merged one day ? Do I need to do something more ?

@hlin117
Copy link
Copy Markdown
Contributor

hlin117 commented Apr 9, 2016

@JPFrancoia [MRG] means that it is ready to be merged, and it is until review from the scikit-learn team.

@agramfort
Copy link
Copy Markdown
Member

LGTM @JPFrancoia please update the what's new to document the bug fix and we're good

results returned with discriminant_analysis.LinearDiscriminantAnalysis's
attribute explained_variance_ratio.
@JPFrancoia
Copy link
Copy Markdown
Contributor Author

Crap. One test failed, and there are some conflicts. What is AppVeyor ? I don't know it.

@agramfort
Copy link
Copy Markdown
Member

merged by rebase after a couple of fixes

0476784
74c29e0
262e9e4

thx @JPFrancoia

@agramfort agramfort closed this Apr 10, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants