[MRG+1] Made PCA expose the singular values#7685
[MRG+1] Made PCA expose the singular values#7685jnothman merged 28 commits intoscikit-learn:masterfrom
Conversation
…variable as per Issue scikit-learn#6955.
amueller
left a comment
There was a problem hiding this comment.
Looks good apart from minor comments. I'm not sure whether we want a test or not.
| to 1.0 | ||
| to 1.0. | ||
|
|
||
| singular_values_ : array, [n_components] |
There was a problem hiding this comment.
please say array, shape (n_components,)
| self.singular_values_ = None | ||
| self.explained_variance_ = None | ||
| self.explained_variance_ratio_ = None | ||
| self.singular_values_ = None |
There was a problem hiding this comment.
Is there any reason to add this here?
There was a problem hiding this comment.
Not really, but the other attributes were set there, so I thought it would be good for consistency to have the singular values be set there as well.
sklearn/decomposition/pca.py
Outdated
| If ``n_components`` is not set then all components are stored and the | ||
| sum of explained variances is equal to 1.0. | ||
|
|
||
| singular_values_ : array, [n_components] |
There was a problem hiding this comment.
hm here all the docstrings seem to be using a different convention. I like shape + tuple better, that's the standard numpy way. But I don't mind that much.
There was a problem hiding this comment.
I've updated them all to follow the numpy standard.
sklearn/decomposition/pca.py
Outdated
| self.explained_variance_ = exp_var = (S ** 2) / n_samples | ||
| full_var = np.var(X, axis=0).sum() | ||
| self.explained_variance_ratio_ = exp_var / full_var | ||
| self.singular_values_ = S.copy() # Store the singular values. |
There was a problem hiding this comment.
No need to copy here. I've removed it.
|
|
||
| singular_values_ : array, [n_components] | ||
| The singular values corresponsing to each of the selected components. | ||
| The singular values corresponds to the 2-norms of the ``n_components`` |
There was a problem hiding this comment.
Maybe say are the 2-norms or are equal to. You just used "corresponding". Also, there's a typo in "corresponding".
| explained_variance_ = (S ** 2) / n_samples | ||
| total_var = explained_variance_.sum() | ||
| explained_variance_ratio_ = explained_variance_ / total_var | ||
| singular_values_ = S.copy() # Store the singular values. |
There was a problem hiding this comment.
Can't this be calculated by the user as np.sqrt(explained_variance_ * n_samples)?
There was a problem hiding this comment.
Sorry. Stupid question. I've read the issue and figure this is all about making something comparable available in TruncatedSVD.
|
Please add tests. |
|
Do you mean to add more doc tests, or to add a unit test? |
|
unit test |
…into pca_expose_singular_values
|
Ok, I've added unit tests for the singular values to PCA, IncrementalPCA and TruncatedSVD. |
…into pca_expose_singular_values
|
Thanks. The test are failing though. |
jnothman
left a comment
There was a problem hiding this comment.
LGTM and thanks for the great tests! I only wonder if we should be refactoring the tests.
|
yeah it would be nice not to duplicate the code as much, bot otherwise LGTM. |
|
@tomlof please add an entry in what's new and let us know so we can merge. |
|
Ok. I've updated the pull request description so that it mentions the unit tests. |
|
What's new is |
| random_state=42, tol=0.0) | ||
| >>> print(svd.explained_variance_ratio_) # doctest: +ELLIPSIS | ||
| [ 0.0782... 0.0552... 0.0544... 0.0499... 0.0413...] | ||
| [ 0.0606... 0.0584... 0.0497... 0.0434... 0.0372...] |
There was a problem hiding this comment.
Someone else had changed those doctests. You see this in my merge commit 279fd60 above. I reverted the change, but when I ran the tests, it failed and I had to change back to what it was I pulled from the main repo. This change back was included in ae86e2f.
I didn't check what had changed to make the unit test change, but just assumed that since it was included in the main repo it was validated. Perhaps someone updated sparse_random_matrix?
There was a problem hiding this comment.
It appears to be the product of a change in sample_without_replacement in commit edc9e7. Let me know if there is anything else you want me to change.
There was a problem hiding this comment.
Oh you're saying it was a merge error on your part and it's since been fixed to reflect master. All good, then.
doc/whats_new.rst
Outdated
| - :class:`decomposition.PCA`, :class:`decomposition.IncrementalPCA` and | ||
| :class:`decomposition.TruncatedSVD` now expose the singular values | ||
| from the underlying SVD. They are stored in the attribute | ||
| `singular_values_`, like in :class:`decomposition.IncrementalPCA`. |
There was a problem hiding this comment.
fixed-width font requires double-backticks in RST
There was a problem hiding this comment.
Thanks, I've updated.
…into pca_expose_singular_values
|
Thanks @tomlof |
|
This seems to have cause a test failure in master: #7893 |
Reference Issue
Fixes #6955.
What does this implement/fix? Explain your changes.
I've made it so that the singular values from the underlying SVD are stored in the PCA classes under the attribute "singular_values_".
This was a trivial fix, and all I did was to save the singular values in an instance variable. I also added the attribute to the list of attributes in the documentations and added doc tests for them. I also added unit tests (
test_singular_values) that cover the estimators PCA, IncrementalPCA and TruncatedSVD to the corresponding tests in sklearn/decomposition/tests.Any other comments?
It may appear like there are loads of commits in this fix, but the commits listed below are some old stuff I was working on years ago, that are no longer in my fork. This fix only add changes to the modules: "decomposition/incremental_pca.py", "decomposition/pca.py" and "decomposition/truncated_pca.py" and to their corresponding unit tests, and the changes are made on the latest commit to the main scikit-learn repository.