[MRG+1] add Convergence warning in LabelPropagation by musically-ut · Pull Request #5893 · scikit-learn/scikit-learn

musically-ut · 2015-11-20T09:45:03Z

Otherwise, it remains unclear whether the convergence was reached or whether the algorithm ran out of iterations. Currently, all the test cases trigger this warning. That is what triggered the investigation which led to #5774.

TomDLT · 2015-11-26T13:28:59Z

LGTM

Can you adapt the tests in order not to raise any convergence warning (or to silence them with ignore_warnings if necessary)?

musically-ut · 2015-11-26T14:57:10Z

Will do.

musically-ut · 2015-12-06T23:19:07Z

I can add with ignore_warnings(): in the tests, but it will still trigger the warnings in the doc-tests. I think I'll work on a more comprehensive solution which addresses the bug in the algorithm #5774 and this convergence issue together.

musically-ut · 2017-06-30T14:52:25Z

I will re-base this after #9239 is merged in.

musically-ut · 2017-07-04T22:31:15Z

I've rebased this against the current master and have silenced the warnings in the doctests.

jnothman

Please add a test

jnothman · 2017-07-05T03:08:49Z

sklearn/semi_supervised/label_propagation.py

 >>> from sklearn import datasets
 >>> from sklearn.semi_supervised import LabelPropagation
->>> label_prop_model = LabelPropagation()
+>>> label_prop_model = LabelPropagation(max_iter=1000)


Is the default max_iter something we should by changing while we are making other backwards incompatible changes? Or is the current default reasonable

To be honest, I don't know. The ConvergenceWarning is partly there to allow users to adjust max_iter depending on their problem.

musically-ut · 2017-07-05T06:58:27Z

Added a test.

jnothman · 2017-07-05T07:55:43Z

well it seems like changing a parameter from 30 to 1000 in order to be assured convergence on a small dataset is a bit extreme...

…

On 5 Jul 2017 4:58 pm, "Utkarsh Upadhyay" ***@***.***> wrote: Added a test. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#5893 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz656MQE5LTQEyMNdLYUHygOCfOpC9ks5sKzQVgaJpZM4GmPUj> .

jnothman · 2017-07-05T08:23:37Z

I'm not sure what makes something slow to converge here, except that for spreading, iterations to convergrnce is indirectly proportional to alpha, is related to number of samples but with a small coefficient, and is unaffected by the proportion of the data unlabelled. For propagation it seems to have a supralinear relation to proportion unlabelled (with high variance), at least on iris. A default of 30 seems reasonable for LabelSpreading in general. For the propagation example at least, we should limit unlabelled to 10% of the data.

…

On 5 Jul 2017 5:55 pm, "Joel Nothman" ***@***.***> wrote: well it seems like changing a parameter from 30 to 1000 in order to be assured convergence on a small dataset is a bit extreme... On 5 Jul 2017 4:58 pm, "Utkarsh Upadhyay" ***@***.***> wrote: > Added a test. > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#5893 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AAEz656MQE5LTQEyMNdLYUHygOCfOpC9ks5sKzQVgaJpZM4GmPUj> > . >

jnothman · 2017-07-05T08:25:16Z

See code at https://gist.github.com/jnothman/c70667210c2a7cf2793102c5ac177f7d

musically-ut · 2017-07-05T08:27:19Z

Hmm. I'll look into it later today/tomorrow.

jnothman

Can you please also assert_no_warning in the convergence case?

musically-ut · 2017-07-20T22:21:03Z

LabelPropagation and LabelSpreading do seem to have very different behavior when it comes to convergence. I've changed the default number of iterations for them (1000 for LabelPropagation and 30 for LabelSpreading). I've also limited the number of unlabelled entries in the Doctests. I am tempted to set the seed in the doctest to make sure that we don't run into accidental failures. What do you think?

I've added assert_no_warning to one of the tests where convergence was being tested. Should I add it to others as well?

jnothman · 2017-07-20T22:38:34Z

sounds good

…

On 21 Jul 2017 8:21 am, "Utkarsh Upadhyay" ***@***.***> wrote: LabelPropagation and LabelSpreading do seem to have very different behavior when it comes to convergence. I've changed the default number of iterations for them (1000 for LabelPropagation and 30 for LabelSpreading). I've also limited the number of unlabelled entries in the Doctests. I am tempted to set the seed in the doctest to make sure that we don't run into accidental failures. What do you think? I've added assert_no_warning to one of the tests where convergence was being tested. Should I add it to others as well? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#5893 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz65-TMpIRP25gO9bGEJkwJfgfo1Zoks5sP9LRgaJpZM4GmPUj> .

musically-ut · 2017-07-20T22:43:43Z

sounds good

So was that a "yes" to both:

Adding a seed to the Doctests
Adding assert_no_warnings to all calls to mdl.fit in the code

?

jnothman · 2017-07-20T22:50:08Z

I don't know that it needs to be on all calls to fit, but for those that depend on convergence, why not?

…

On 21 Jul 2017 8:43 am, "Utkarsh Upadhyay" ***@***.***> wrote: sounds good So was that a "yes" to both: - Adding a seed to the Doctests - Adding assert_no_warnings to *all* calls to mdl.fit in the code ? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#5893 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz65K0iFVWQD2pfzWcgH25JJ4g6aoJks5sP9gggaJpZM4GmPUj> .

musically-ut · 2017-07-22T09:00:42Z

I've set np.random.seed for the tests. I am not sure how much of state is kept while switching from one set of tests to another and whether my setting the seed globally may reduce the variance in the tests happening downstream. If that is an issue, I can create a test specific RandomState for each doctest.

Also, I've added assert_no_warnings on the tests which depend on convergence.

jnothman · 2017-07-23T13:01:09Z

Don't set the seed globally. It is not safe for multi-threaded testing, if nothing else. ~~But usually we'd just use an integer for random_state.~~ Just use rng = np.random.RandomState(42) or something instead.

jnothman

Now that you've put all the assert_no_warnings in there, I think they just add confusion. Sorry. If you want to test convergence, test it with n_iter_. If you want to test that no warnings are issued in convergence (and you should), add this assertion to test_convergence_warning.

jnothman · 2017-07-23T13:04:37Z

Thanks

musically-ut · 2017-07-23T18:08:28Z

Ready for review (not sure whether pushing commits to the branch sends out notifications on GitHub).

jnothman

Yes, we see commits as they come, but it is often helpful to confirm what they mean with a comment.

Sorry to add more work. You can make the change of defaults a separate PR to make sure we get it into 0.19.

Also, please add your name to the list of file authors.

jnothman · 2017-07-23T23:20:41Z

sklearn/semi_supervised/label_propagation.py


    def __init__(self, kernel='rbf', gamma=20, n_neighbors=7,
-                 alpha=None, max_iter=30, tol=1e-3, n_jobs=1):
+                 alpha=None, max_iter=1000, tol=1e-3, n_jobs=1):


We can only make this change if we include it in the 0.19 final release. Just to note...

I am not sure I follow so I'll rephrase what I understood:

If we create a separate PR just with this change, it can get included in the 0.19 final release.

Did I understand the note correctly? If so, I'll create one presently.

jnothman · 2017-07-23T23:36:38Z

sklearn/semi_supervised/label_propagation.py

                    alpha, self.label_distributions_) + y_static
            remaining_iter -= 1

+        if remaining_iter <= 1:


I think this is right given the current implementation, but the current implementation is buggy in that it seems you can never obtain n_iter_==max_iter.

Please fix the convergence condition and inline the code for _not_converged. I would then write something like:

while self.n_iter_ < self.max_iter: if ... < tol: break Do stuff self.n_iter_ += 1 else: Warn

LabelPropagation converges much slower than LabelSpreading. The default of max_iter=30 works well for LabelSpreading but not for LabelPropagation. This was extracted from scikit-learn#5893.

LabelPropagation converges much slower than LabelSpreading. The default of max_iter=30 works well for LabelSpreading but not for LabelPropagation. This was extracted from #5893.

jnothman

LGTM

jnothman · 2017-07-29T13:29:18Z

sklearn/semi_supervised/label_propagation.py

-               and remaining_iter > 1):
+
+        self.n_iter_ = 0
+        while self.n_iter_ < self.max_iter:


I suppose we could now do this as a for loop...

You mean setting self.n_iter_ using the loop variable?
I personally feel a bit uncomfortable using the loop variable outside the loop.

(* angsty feeling *)

So ... shall I make this change to expedite the merge?

I think he meant

for n_iter in range(self.max_iter):

With the for loop, there are two alternatives:

Alternative 1

Setting self.n_iter_ after the for loop. It would look a bit ugly because we'll need to figure out whether in the last iteration the loop the tolerance condition was met (then self.n_iter_ = n_iter) or the else: branch was reached (in which case self.n_iter_ = n_iter + 1):

self.n_iter_ = 0 for n_iter in range(self.max_iter): if converged: break # ... else: # warn self.n_iter_ = 1 # Count the last iteration. self.n_iter_ += n_iter

Alternate 2

Leaving self.n_iter_ += 1 inside the for-loop. In this case case the loop variable (i.e. n_iter) is not used anywhere.

self.n_iter_ = 0 for n_iter in range(self.max_iter): if converged: break # ... self.n_iter_ += 1 else: # warn

Were any of these versions what you (both of you) had in mind? Personally, I like the while loop and then Alternative 2. :)

@amueller In the new proposal, self.n_iter_ can never be zero, and zero is a valid value technically. The handling of these corner cases is what makes the for loop slightly ugly.

I'm sorry, I am away from my laptop and am a bit constrained in my explanations.

This comment was a cosmetic one that is not worth withholding merge for.

So ... is that a "go" for the while loop? :-)

Whatever you think most readable. I'd prefer

for self.n_iter_ in ...: ... else: ...

I think, but it matters very little

First, I didn't know that we could use self.n_iter_ in the for loop; today I learned. :)

Second, this is the best implementation I can think of which has the same behavior as the original while loop:

for self.n_iter_ in range(max_iter): # ... else: # ... self.n_iter_ += 1

The self.n_iter_ += 1 in the else: clause is necessary to ensure that self.n_iter_ == self.max_iter is possible and that it correctly happens if the loop doesn't break out due to convergence. This was the issue we were out to fix originally. :)

With this implementation, I am quite happy with the for loop as well. I'll make this change. 👍

amueller · 2017-08-01T20:30:24Z

Do we want this for 0.19?

amueller · 2017-08-01T20:33:40Z

LGTM, no strong opinion on the for loop thought it would be slightly nicer.

jnothman · 2017-08-01T23:47:43Z

I'm happy for this to be merged. I don't consider it essential for 0.19, but we are newly encouraging people to use these estimators, so why not.

Also, add tests for verify that n_iter_ == max_iter if warning is raised.

LabelPropagation converges much slower than LabelSpreading. The default of max_iter=30 works well for LabelSpreading but not for LabelPropagation. This was extracted from scikit-learn#5893.

musically-ut · 2017-08-06T17:57:45Z

Bump!

…#5893)

LabelPropagation converges much slower than LabelSpreading. The default of max_iter=30 works well for LabelSpreading but not for LabelPropagation. This was extracted from scikit-learn#5893.

…#5893)

LabelPropagation converges much slower than LabelSpreading. The default of max_iter=30 works well for LabelSpreading but not for LabelPropagation. This was extracted from scikit-learn#5893.

…#5893)

LabelPropagation converges much slower than LabelSpreading. The default of max_iter=30 works well for LabelSpreading but not for LabelPropagation. This was extracted from scikit-learn#5893.

…#5893)

LabelPropagation converges much slower than LabelSpreading. The default of max_iter=30 works well for LabelSpreading but not for LabelPropagation. This was extracted from scikit-learn#5893.

…#5893)

LabelPropagation converges much slower than LabelSpreading. The default of max_iter=30 works well for LabelSpreading but not for LabelPropagation. This was extracted from scikit-learn#5893.

…#5893)

LabelPropagation converges much slower than LabelSpreading. The default of max_iter=30 works well for LabelSpreading but not for LabelPropagation. This was extracted from scikit-learn#5893.

…#5893)

TomDLT changed the title ~~EHN: Show a Convergence warning if the max_iters were performed.~~ [MRG] add Convergence warning in LabelPropagation Nov 26, 2015

amueller added the Waiting for Reviewer label Dec 10, 2015

musically-ut force-pushed the feat-warn-label-prop-convergence branch from 6005320 to 9260699 Compare July 4, 2017 22:29

jnothman reviewed Jul 5, 2017

View reviewed changes

jnothman reviewed Jul 23, 2017

View reviewed changes

musically-ut mentioned this pull request Jul 23, 2017

[WIP] Make knn kernel undirected. #9439

Closed

1 task

jnothman reviewed Jul 23, 2017

View reviewed changes

musically-ut mentioned this pull request Jul 24, 2017

Increase the max_iter for LabelPropagation. #9441

Merged

musically-ut added 4 commits July 28, 2017 09:22

Add seed for tests.

8dc12a1

Add assert_no_warn on all tests which rely on convergance.

9e04e3e

Use a RandomState instead of setting seed globally

0b23b24

Move assert_no_warnings to the dedicated test.

54a5f26

musically-ut force-pushed the feat-warn-label-prop-convergence branch from dd9e8fb to 54a5f26 Compare July 28, 2017 07:22

musically-ut added 3 commits July 28, 2017 09:58

Fix bug with max_iter.

d0bf21a

Add myself to file authors.

cae57df

Fix flake8 errors.

5e2e301

jnothman reviewed Jul 29, 2017

View reviewed changes

jnothman changed the title ~~[MRG] add Convergence warning in LabelPropagation~~ [MRG+1] add Convergence warning in LabelPropagation Jul 29, 2017

While loop -> for loop.

3dfe110

Also, add tests for verify that n_iter_ == max_iter if warning is raised.

jnothman merged commit 6d4ae1b into scikit-learn:master Aug 6, 2017

jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Aug 6, 2017

FIX Convergence warning and n_iter_ in LabelPropagation (scikit-learn…

e5b892e

…#5893)

dmohns pushed a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017

FIX Convergence warning and n_iter_ in LabelPropagation (scikit-learn…

1e5d602

…#5893)

dmohns pushed a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017

FIX Convergence warning and n_iter_ in LabelPropagation (scikit-learn…

14066df

…#5893)

paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

FIX Convergence warning and n_iter_ in LabelPropagation (scikit-learn…

7a6c1f8

…#5893)

AishwaryaRK pushed a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017

FIX Convergence warning and n_iter_ in LabelPropagation (scikit-learn…

dba402c

…#5893)

maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

FIX Convergence warning and n_iter_ in LabelPropagation (scikit-learn…

1e90612

…#5893)

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

FIX Convergence warning and n_iter_ in LabelPropagation (scikit-learn…

c9b6673

…#5893)

Uh oh!

Conversation

musically-ut commented Nov 20, 2015

Uh oh!

TomDLT commented Nov 26, 2015

Uh oh!

musically-ut commented Nov 26, 2015

Uh oh!

musically-ut commented Dec 6, 2015

Uh oh!

musically-ut commented Jun 30, 2017

Uh oh!

musically-ut commented Jul 4, 2017

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

musically-ut commented Jul 5, 2017

Uh oh!

jnothman commented Jul 5, 2017 via email

Uh oh!

jnothman commented Jul 5, 2017 via email

Uh oh!

jnothman commented Jul 5, 2017

Uh oh!

musically-ut commented Jul 5, 2017

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

musically-ut commented Jul 20, 2017

Uh oh!

jnothman commented Jul 20, 2017 via email

Uh oh!

musically-ut commented Jul 20, 2017

Uh oh!

jnothman commented Jul 20, 2017 via email

Uh oh!

musically-ut commented Jul 22, 2017

Uh oh!

jnothman commented Jul 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

jnothman commented Jul 23, 2017

Uh oh!

musically-ut commented Jul 23, 2017

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Alternative 1

jnothman commented Jul 23, 2017 •

edited

Loading