[MRG] Fix #6031: changed calculation of explained_variance_ratio_, SVD solver#6027
[MRG] Fix #6031: changed calculation of explained_variance_ratio_, SVD solver#6027JPFrancoia wants to merge 7 commits intoscikit-learn:masterfrom JPFrancoia:master
Conversation
solver. See issue #5216. However, this attribute is corrected from the previous commit which, might not return the expected values.
|
this is an API change. You cannot do this. The size of the attribute must stay the same. Otherwise we need a deprecation cycle. |
|
Ok. Then https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/discriminant_analysis.py#L337 is not consistent, and should be: self.explained_variance_ratio_ = np.sort(evals / np.sum(evals))[::-1][:self.n_components] |
|
ok please update the unit tests to show this issue and present the PR as a
fix.
thx
|
|
@JPFrancoia: Like @agramfort mentioned, you should change the title of the PR to allude to a bug fix. Thanks! |
|
Like that ? |
|
@JPFrancoia : Excellent =] Good luck! |
|
I'll make the corrections asap. |
LinearDiscriminantAnalysis, SVD solver.
|
@hlin117 , for the regression tests, I used your way to generate the test data, it's efficient. I think it generates better test data than make_blobs, which typically returns an array like: [1 a_very_small_value] With this kind of array, the test would always pass, and wouldn't spot errors. Now it will. |
sklearn/discriminant_analysis.py
Outdated
There was a problem hiding this comment.
Add 4 more spaces here to conform to PEP 8 indentation.
https://www.python.org/dev/peps/pep-0008/#indentation
They are not: def test_lda_explained_variance_ratio():
# Test if the sum of the normalized eigen vectors values equals 1,
# Also tests whether the explained_variance_ratio_ formed by the
# eigen solver is the same as the explained_variance_ratio_ formed
# by the svd solver
state = np.random.RandomState(0)
X = state.normal(loc=0, scale=100, size=(40, 20))
y = state.randint(0, 3, size=(40, 1))
clf_lda_eigen = LinearDiscriminantAnalysis(solver="eigen")
clf_lda_eigen.fit(X, y)
assert_almost_equal(clf_lda_eigen.explained_variance_ratio_.sum(), 1.0, 3)
print("eigen")
print(len(clf_lda_eigen.explained_variance_ratio_))
print(clf_lda_eigen.explained_variance_ratio_)
clf_lda_svd = LinearDiscriminantAnalysis(solver="svd")
clf_lda_svd.fit(X, y)
assert_almost_equal(clf_lda_svd.explained_variance_ratio_.sum(), 1.0, 3)
print("svd")
print(len(clf_lda_svd.explained_variance_ratio_))
print(clf_lda_svd.explained_variance_ratio_)
# NOTE: clf_lda_eigen.explained_variance_ratio_ is not of n_components
# length. Make it the same length as clf_lda_svd.explained_variance_ratio_
# before comparison.
assert_array_almost_equal(clf_lda_svd.explained_variance_ratio_,
clf_lda_eigen.explained_variance_ratio_) |
|
Of course they are not. See #6032 |
|
Ok, corrections made @hlin117 . |
|
LGTM. Can you add [MRG] to the title of this PR? |
|
@JPFrancoia : #5216 has already been merged. Did you mean another PR or issue? |
|
Hi, I'm actually not sure what the [MRG] tag means. Will this PR be merged one day ? Do I need to do something more ? |
|
@JPFrancoia [MRG] means that it is ready to be merged, and it is until review from the scikit-learn team. |
|
LGTM @JPFrancoia please update the what's new to document the bug fix and we're good |
results returned with discriminant_analysis.LinearDiscriminantAnalysis's attribute explained_variance_ratio.
|
Crap. One test failed, and there are some conflicts. What is AppVeyor ? I don't know it. |
|
merged by rebase after a couple of fixes thx @JPFrancoia |
solver. See issue #5216. However, this attribute is corrected from the
previous commit which might not return the expected values.