Skip to content

[MRG+1] Correcting length of explained_variance_ratio_, eigen solver, final PR#7632

Merged
jnothman merged 10 commits intoscikit-learn:masterfrom
JPFrancoia:master
Oct 25, 2016
Merged

[MRG+1] Correcting length of explained_variance_ratio_, eigen solver, final PR#7632
jnothman merged 10 commits intoscikit-learn:masterfrom
JPFrancoia:master

Conversation

@JPFrancoia
Copy link
Copy Markdown
Contributor

Reference Issue

Fix #6032

What does this implement/fix? Explain your changes.

Attribute explained_variance_ratio_ from LinearDiscriminantAnalysis class will be of length n_components (eigen solver).

Any other comments?

This PR follows PR 7616. I mixed up my git history, so it was easier to open a new PR.

LinearDiscriminantAnalysis class will be of length n_components, if
provided. If not provided, will have a maximum length of n_classes. The
attribute will have the same length, whatever the solver (SVD, Eigen).
- Access to public attributes ``.X_`` and ``.y_`` has been deprecated in
:class:`isotonic.IsotonicRegression`. By `Jonathan Arfa`_.

- Length of `explained_variance_ratio` of :clas:`discriminant_analysis.LinearDiscriminantAnalysis`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still cannot be under 0.18.0

:class:`discriminant_analysis.LinearDiscriminantAnalysis` now returns
correct results. By `JPFrancoia`_
:clas:`discriminant_analysis.LinearDiscriminantAnalysis` now returns correct results.
Attribute `explained_variance_ratio` calculated with SVD and Eigen solver are now of
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still cannot be under 0.18.0

@jnothman jnothman added this to the 0.18.1 milestone Oct 10, 2016
@JPFrancoia
Copy link
Copy Markdown
Contributor Author

Ok. Under what should it be then ? We are almost there (and no more test failures).

@jnothman
Copy link
Copy Markdown
Member

Ok. Under what should it be then ? We are almost there (and no more test failures).

@amueller, should we create a 0.18.1 heading in what's new in master?

@jnothman
Copy link
Copy Markdown
Member

There's now a 0.18.1 section

@JPFrancoia
Copy link
Copy Markdown
Contributor Author

Ok, I put my changes under 0.18.1. I created the sub-section API changes summary under 0.18.1 too.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've repeatedly use :clas:. Should be :class:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arf. Sorry to waste your time with a typo.

`#6497 <https://github.com/scikit-learn/scikit-learn/pull/6497>`_
by `Sebastian Säger`_

- Attribute ``explained_variance_ratio`` of :class:`discriminant_analysis.LinearDiscriminantAnalysis`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can waste both of our time a little further by asking that you make the lines somewhat shorter (around 80 chars max) and insert a link to the issue/PR which seems to have been adopted here... Sorry.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. Done.

@jnothman
Copy link
Copy Markdown
Member

Otherwise LGTM

@jnothman jnothman changed the title Correcting length of explained_variance_ratio_, eigen solver, final PR [MRG+1] Correcting length of explained_variance_ratio_, eigen solver, final PR Oct 13, 2016

# Get maximum length for explained_variance_ratio_
if self.n_components is None:
max_length_exp = len(self.classes_)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't it be len(self.classes_) - 1?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. If n_components is not provided, then explained_variance_ratio_ will have a length of maximum len(self.classes). So [:max_length_exp]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that's a bug, the last one will always be zero.

Copy link
Copy Markdown
Member

@amueller amueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explained_variance_ratio_ definitely needs to be at most of length n_classes - 1

@JPFrancoia
Copy link
Copy Markdown
Contributor Author

Please explain why you think it should be len(self.classes_) - 1, I don't understand what you mean.

If I introduce this change, the tests fail:

nosetests -v sklearn.tests.test_discriminant_analysis
sklearn.tests.test_discriminant_analysis.test_lda_predict ... ok
sklearn.tests.test_discriminant_analysis.test_lda_priors ... ok
sklearn.tests.test_discriminant_analysis.test_lda_coefs ... ok
sklearn.tests.test_discriminant_analysis.test_lda_transform ... ok
sklearn.tests.test_discriminant_analysis.test_lda_explained_variance_ratio ... FAIL
sklearn.tests.test_discriminant_analysis.test_lda_orthogonality ... ok
sklearn.tests.test_discriminant_analysis.test_lda_scaling ... ok
sklearn.tests.test_discriminant_analysis.test_qda ... ok
sklearn.tests.test_discriminant_analysis.test_qda_priors ... ok
sklearn.tests.test_discriminant_analysis.test_qda_store_covariances ... ok
sklearn.tests.test_discriminant_analysis.test_qda_regularization ... ok
sklearn.tests.test_discriminant_analysis.test_deprecated_lda_qda_deprecation ... ok
sklearn.tests.test_discriminant_analysis.test_covariance ... ok

======================================================================
FAIL: sklearn.tests.test_discriminant_analysis.test_lda_explained_variance_ratio
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/djipey/.local/share/virtualenvs/sk/lib/python3.5/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/home/djipey/.local/share/virtualenvs/sk/lib/python3.5/site-packages/sklearn/tests/test_discriminant_analysis.py", line 180, in test_lda_explained_variance_ratio
    clf_lda_eigen.explained_variance_ratio_)
  File "/home/djipey/.local/share/virtualenvs/sk/lib/python3.5/site-packages/numpy/testing/utils.py", line 918, in assert_array_almost_equal
    precision=decimal)
  File "/home/djipey/.local/share/virtualenvs/sk/lib/python3.5/site-packages/numpy/testing/utils.py", line 694, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not almost equal to 6 decimals

(shapes (3,), (2,) mismatch)
 x: array([  6.037955e-01,   3.962045e-01,   1.787350e-32])
 y: array([ 0.603796,  0.396204])

----------------------------------------------------------------------
Ran 13 tests in 0.112s

FAILED (failures=1)

@amueller
Copy link
Copy Markdown
Member

The test fails because it's also wrong for the other solver imho.

My understanding is that the explained_variance_ratio_ refers to how much variance each of the components explains. However, there can only be n_classes - 1 components, as the components are spanned by the means of the classes.

@JPFrancoia
Copy link
Copy Markdown
Contributor Author

You are right (I double-checked, just to be sure). I fixed it for the 2 solvers, and I updated the tests so in the future the length of this attribute will be checked.


- Length of `explained_variance_ratio` of
:class:`discriminant_analysis.LinearDiscriminantAnalysis`
is now the same for Eigen and SVD solvers. (`#7632
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should be more explicit by saying that it changed for both and is now min(n_components, n_classes - 1).

@amueller
Copy link
Copy Markdown
Member

can you maybe add an attribute n_components_ that is computed in fit and just use that? It should also be used in transform which currently has a kind of nonsensical check.

@JPFrancoia
Copy link
Copy Markdown
Contributor Author

There is already an attribute self.n_components. It might be ambigious. Maybe I can call it max_components_ ?

@amueller
Copy link
Copy Markdown
Member

Not really. We sometimes have attributes with names that are the same as parameters (but with a trailing underscore) if the "real" value is computed from the training data. We could do self._n_components, too, if you want to make it private for now.

LinearDiscriminantAnalysis class. Used to determinate the maximum number
of components returned by the class. Also used for the lenght of
explained_variance_ratio_. See PR 7632.

Fixed some stuff to be more PEP8.
@JPFrancoia
Copy link
Copy Markdown
Contributor Author

Hum ok, I decided to call it _max_components, because it is really more explicit. This attribute is only used to store the maximum number of components of the classifier, so let's call a cat a cat.

I made it private for now. Let me know if you agree with these changes.

@amueller
Copy link
Copy Markdown
Member

hm I'm not sure why you feel _max_components is more apt than _n_components. If n_components < n_classes - 1 it's not really the max, right? But I don't really have a strong opinion, it's private anyhow.

@amueller
Copy link
Copy Markdown
Member

LGTM :)

@amueller
Copy link
Copy Markdown
Member

(There's one +1 on the current version which is from me. I think @jnothman's doesn't apply any more because there were substantial changes. maybe he has time to review again)

@jnothman
Copy link
Copy Markdown
Member

LGTM

@jnothman jnothman merged commit f260898 into scikit-learn:master Oct 25, 2016
amueller pushed a commit to amueller/scikit-learn that referenced this pull request Oct 25, 2016
amueller pushed a commit to amueller/scikit-learn that referenced this pull request Oct 27, 2016
sergeyf pushed a commit to sergeyf/scikit-learn that referenced this pull request Feb 28, 2017
Sundrique pushed a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LDA.explained_variance_ratio_ is of the wrong size

3 participants