[MRG] Fixed NMF IndexError#11667
Conversation
TomDLT
left a comment
There was a problem hiding this comment.
LGTM except nitpicks.
Can you also add a bugfix entry in doc/whats_new/v0.20.rst?
Thanks !
sklearn/decomposition/nmf.py
Outdated
| check_non_negative(X, "NMF initialization") | ||
| n_samples, n_features = X.shape | ||
|
|
||
| if init == 'nndsvd' and n_components > min(n_samples, n_features): |
There was a problem hiding this comment.
The NNDSVD is performed also with init = 'nndsvda' and init = 'nndsvdar', which also need the same constraint.
You can use if init != 'random' to handle all three cases.
Please also update your unit test accordingly.
sklearn/decomposition/nmf.py
Outdated
|
|
||
| def non_negative_factorization(X, W=None, H=None, n_components=None, | ||
| init='random', update_H=True, solver='cd', | ||
| init=None, update_H=True, solver='cd', |
There was a problem hiding this comment.
Why are we changing this default init?
There was a problem hiding this comment.
Sorry for the super late reply and thanks for pointing this out. I should have explain it better in the PR notes.
I made this changes because the documentation says here
Default: 'nndsvd' if n_components <= min(n_samples, n_features),
otherwise random.
But in init is set to 'nndsvd' or 'random' only if init is None. See here
if init is None:
if n_components <= min(n_samples, n_features):
init = 'nndsvd'
else:
init = 'random'
There was a problem hiding this comment.
Then we need to update the documentation to reflect the true default, not make a backwards-incompatible change. You are welcome to update the documentation in a separate, focused PR.
There was a problem hiding this comment.
Got it. Let me revert that and create a new PR on that.
sklearn/decomposition/nmf.py
Outdated
| check_non_negative(X, "NMF initialization") | ||
| n_samples, n_features = X.shape | ||
|
|
||
| if (init and init != 'random' |
There was a problem hiding this comment.
always use init is not None rather than just testing init as a bool
doc/whats_new/v0.20.rst
Outdated
| - |Feature| A scorer based on :func:`metrics.brier_score_loss` is also | ||
| available. :issue:`9521` by :user:`Hanmin Qin <qinhanmin2014>`. | ||
|
|
||
| - Fixed a bug in :class:`decomposition.NMF` where `init = 'nndsvd'`, |
There was a problem hiding this comment.
This is in the wrong place and should be prefixed by |Fix|
|
@jnothman Thank you for your quick turnaround. The test is failing because Per your comment above, we want to update the documentation instead of making a backwards-incompatible change. However, not changing the code means that I am wondering what is your suggestion on this. |
|
I am okay to leave them inconsistent for now, or we could change the
default in `non_negative_factorization` with a deprecation process (see
http://scikit-learn.org/dev/developers/contributing.html#change-the-default-value-of-a-parameter
).
|
|
Tests are now failing |
doc/whats_new/v0.21.rst
Outdated
| `n_components < n_features` instead of | ||
| `n_components <= min(n_samples, n_features)`. | ||
| :issue:`11650` by :user:`Hossein Pourbozorg <hossein-pourbozorg>` and | ||
| `Zijie (ZJ) Poh <zjpoh>`. |
|
Thanks @zjpoh ! |
This reverts commit 147745e.
This reverts commit 147745e.
Reference Issues/PRs
Fixes #11650
What does this implement/fix? Explain your changes.
I changed the condition for using$A\in\mathbb{R}^{m\times n}_+$ , the condition for
nndsvdto initialize NMF to k <= min(m,n) and raised an ValueError if k > min(m,n) andinit = 'nndsvd'.Why?
According to the table 1 of SVD based initialization paper, for matrix
nndsvdinitialization is k < min(m, n). However, in the code,nndsvdis used when k < n instead of k < min(m, n).Note that in the code, k = n_components, m = n_samples and n = n_features.
In addition, from my understanding of the paper, k <= min(m,n) is a sufficient condition because the initialization method is based on SVD and SVD only requires k <= min(m, n) instead of k < min(m, n). I have verified that setting k <= min(m, n) gives the correct solution and passes all unit tests.
I set
init = Noneas the default parameter ofnon_negative_factorization.Why?
non_negative_factorizationhasinit = 'random'[here] while_initialize_nmfsetsinit = 'nndsvd'only if it is called withinit = None[here]. That is, the calculation ofnon_negative_factorizationwith the default parameter will never useinit = 'nndsvd'. Hence, the output ofnon_negative_factorizationandNMFclass will be different unlessinit = 'random'is passed as a parameter to NMF.Note:
non_negative_factorizationhaveinit = 'random'as the default parameter is also inconsistent with the documentation.Any other comments?
I am aware that the changes that I made are inconsistent with the suggestions in the issue. But this is what I found out after reading the code and the paper. Please let me know if this makes sense. Thanks.