[MRG] Fixed NMF IndexError by zjpoh · Pull Request #11667 · scikit-learn/scikit-learn

zjpoh · 2018-07-24T05:44:55Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

I changed the condition for using nndsvd to initialize NMF to k <= min(m,n) and raised an ValueError if k > min(m,n) and init = 'nndsvd'.
Why?
According to the table 1 of SVD based initialization paper, for matrix $A\in\mathbb{R}^{m\times n}_+$, the condition for nndsvd initialization is k < min(m, n). However, in the code, nndsvd is used when k < n instead of k < min(m, n).
Note that in the code, k = n_components, m = n_samples and n = n_features.
In addition, from my understanding of the paper, k <= min(m,n) is a sufficient condition because the initialization method is based on SVD and SVD only requires k <= min(m, n) instead of k < min(m, n). I have verified that setting k <= min(m, n) gives the correct solution and passes all unit tests.
I set init = None as the default parameter of non_negative_factorization.
Why?
non_negative_factorization has init = 'random' [here] while _initialize_nmf sets init = 'nndsvd' only if it is called withinit = None [here]. That is, the calculation of non_negative_factorization with the default parameter will never use init = 'nndsvd'. Hence, the output of non_negative_factorization and NMF class will be different unless init = 'random' is passed as a parameter to NMF.
Note: non_negative_factorization have init = 'random' as the default parameter is also inconsistent with the documentation.

Any other comments?

I am aware that the changes that I made are inconsistent with the suggestions in the issue. But this is what I found out after reading the code and the paper. Please let me know if this makes sense. Thanks.

jnothman

Please add a test

TomDLT

LGTM except nitpicks.

Can you also add a bugfix entry in doc/whats_new/v0.20.rst?

Thanks !

TomDLT · 2018-07-25T16:48:55Z

sklearn/decomposition/nmf.py

    check_non_negative(X, "NMF initialization")
    n_samples, n_features = X.shape

+    if init == 'nndsvd' and n_components > min(n_samples, n_features):


The NNDSVD is performed also with init = 'nndsvda' and init = 'nndsvdar', which also need the same constraint.
You can use if init != 'random' to handle all three cases.
Please also update your unit test accordingly.

Good point. Thanks!

jnothman · 2018-07-29T03:16:06Z

sklearn/decomposition/nmf.py


 def non_negative_factorization(X, W=None, H=None, n_components=None,
-                               init='random', update_H=True, solver='cd',
+                               init=None, update_H=True, solver='cd',


Why are we changing this default init?

Sorry for the super late reply and thanks for pointing this out. I should have explain it better in the PR notes.

I made this changes because the documentation says here

Default: 'nndsvd' if n_components <= min(n_samples, n_features), otherwise random.

But in init is set to 'nndsvd' or 'random' only if init is None. See here

if init is None: if n_components <= min(n_samples, n_features): init = 'nndsvd' else: init = 'random'

Then we need to update the documentation to reflect the true default, not make a backwards-incompatible change. You are welcome to update the documentation in a separate, focused PR.

Got it. Let me revert that and create a new PR on that.

jnothman · 2018-07-29T03:18:35Z

sklearn/decomposition/nmf.py

    check_non_negative(X, "NMF initialization")
    n_samples, n_features = X.shape

+    if (init and init != 'random'


always use init is not None rather than just testing init as a bool

jnothman · 2018-09-13T09:12:01Z

doc/whats_new/v0.20.rst

 - |Feature| A scorer based on :func:`metrics.brier_score_loss` is also
  available. :issue:`9521` by :user:`Hanmin Qin <qinhanmin2014>`.

+- Fixed a bug in :class:`decomposition.NMF` where `init = 'nndsvd'`,


This is in the wrong place and should be prefixed by |Fix|

zjpoh · 2018-09-14T03:34:09Z

@jnothman Thank you for your quick turnaround.

The test is failing because NMF and non_negative_factorization have different default init. NMF has init=None while non_negative_factorization has init='random'.

Per your comment above, we want to update the documentation instead of making a backwards-incompatible change. However, not changing the code means that NMF and non_negative_factorization are inconsistent.

I am wondering what is your suggestion on this.

jnothman · 2018-09-15T11:05:38Z

I am okay to leave them inconsistent for now, or we could change the default in `non_negative_factorization` with a deprecation process (see http://scikit-learn.org/dev/developers/contributing.html#change-the-default-value-of-a-parameter ).

jnothman · 2019-01-16T11:43:11Z

Tests are now failing

jnothman · 2019-02-12T01:26:12Z

doc/whats_new/v0.21.rst

+  `n_components < n_features` instead of
+  `n_components <= min(n_samples, n_features)`. 
+  :issue:`11650` by :user:`Hossein Pourbozorg <hossein-pourbozorg>` and
+  `Zijie (ZJ) Poh <zjpoh>`.


use :user:

jnothman · 2019-02-12T22:44:47Z

Thanks @zjpoh !

This reverts commit 147745e.

zjpoh added 2 commits July 23, 2018 21:28

Fixed initialization procedure condition

5d933b5

Reverted pydoc changes

9caee62

jnothman reviewed Jul 24, 2018

View reviewed changes

zjpoh changed the title ~~[MRG] Fixed NMF IndexError~~ [WIP] Fixed NMF IndexError Jul 24, 2018

Added unit test

14b1ea6

zjpoh changed the title ~~[WIP] Fixed NMF IndexError~~ [MRG] Fixed NMF IndexError Jul 25, 2018

TomDLT approved these changes Jul 25, 2018

View reviewed changes

Update per TomDLT comments

0d7c1bd

jnothman reviewed Jul 29, 2018

View reviewed changes

adrinjalali mentioned this pull request Aug 19, 2018

[WIP] fixes OPTICS split_points detected as NOISE and last point not detected as outlier. #11857

Closed

zjpoh and others added 3 commits September 9, 2018 21:49

Modify per jnothman comment

c759c4b

Merge branch 'master' into nmf_index_error

4fa03ab

Revert changes to default init

5370f4a

This was referenced Sep 13, 2018

non_negative_factorization init is inconsistent with docstring #12062

Closed

[MRG] Fix docstring inconsistency for NMF #12063

Merged

jnothman approved these changes Sep 13, 2018

View reviewed changes

jnothman reviewed Sep 13, 2018

View reviewed changes

zjpoh and others added 2 commits September 13, 2018 20:03

Fix per jnothman comments

a1832a0

Merge branch 'master' into nmf_index_error

bc48fbb

TomDLT added the Bug label Sep 24, 2018

zjpoh added 3 commits January 15, 2019 22:45

Merge branch 'master' into nmf_index_error

91b4bb6

Move whats new to v0.21

020742a

Fix docstring

6d56bcd

This was referenced Jan 16, 2019

NMF and non_negative_factorization have inconsistent default init #12988

Closed

[MRG] NMF and non_negative_factorization have inconsistent default init #12989

Merged

zj added 2 commits January 16, 2019 09:01

Fix unit test

c007b38

Fix spacing

0a7426f

zjpoh changed the title ~~[MRG] Fixed NMF IndexError~~ [WIP] Fixed NMF IndexError Jan 23, 2019

flake8 fix

47be426

zjpoh changed the title ~~[WIP] Fixed NMF IndexError~~ [MRG] Fixed NMF IndexError Jan 23, 2019

Merge branch 'master' into nmf_index_error

8fcbf39

jnothman reviewed Feb 12, 2019

View reviewed changes

Update v0.21.rst

1b998aa

jnothman merged commit 42073c2 into scikit-learn:master Feb 12, 2019

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

[MRG] Fixed NMF IndexError (scikit-learn#11667)

147745e

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "[MRG] Fixed NMF IndexError (scikit-learn#11667)"

5c3b66b

This reverts commit 147745e.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "[MRG] Fixed NMF IndexError (scikit-learn#11667)"

82154f2

This reverts commit 147745e.

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

[MRG] Fixed NMF IndexError (scikit-learn#11667)

5a221be

Uh oh!

Conversation

zjpoh commented Jul 24, 2018

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

TomDLT left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zjpoh commented Sep 14, 2018

Uh oh!

jnothman commented Sep 15, 2018 via email

Uh oh!

jnothman commented Jan 16, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman commented Feb 12, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants