[MRG + 1] Fix for OvR partial_fit bug by srivatsan-ramesh · Pull Request #7786 · scikit-learn/scikit-learn

srivatsan-ramesh · 2016-10-29T18:11:10Z

Reference Issue

What does this implement/fix? Explain your changes.

partial_fit() of OvR was not working properly when the mini-batches did not contain all the classes.
The LabelBinarizer.fit() should be called with classes_ parameter and not the y parameter of partial_fit()
And added tests to check the partial_fit() function.

Any other comments?

The PR #6239 did not seem to be the correct fix.

amueller · 2016-11-02T20:06:56Z

LGTM, thanks :)

lesteve · 2016-11-04T18:34:59Z

sklearn/tests/test_multiclass.py

+    # A new class value which was not in the first call of partial_fit
+    # It should raise ValueError
+    y1 = [5] + y[7:-1]
+    assert_raises(ValueError, ovr.partial_fit, X=X[7:], y=y1)


Can you use assert_raises_regex to check the error message as well?

@lesteve Done!

but then you can drop this line.

raghavrv · 2016-11-05T10:51:54Z

Travis seems to fail... Could you fix that?

srivatsan-ramesh · 2016-11-05T10:57:04Z

@raghavrv It says HTTPError : 409 Client Error What do I do?

raghavrv · 2016-11-05T10:58:44Z

I've not looked into it, but I've restarted the build for you... Let's see if it is some spurious error.

srivatsan-ramesh · 2016-11-05T11:00:25Z

Ok 👍

srivatsan-ramesh · 2016-11-05T11:14:26Z

@raghavrv Thank you, Travis ran successfully.

jnothman

Otherwise LGTM

jnothman · 2016-11-05T12:50:53Z

sklearn/multiclass.py

+            self.label_binarizer_ = LabelBinarizer(sparse_output=True)
+            self.label_binarizer_.fit(self.classes_)
+
+        if not set(self.classes_).issuperset(y):


for large y, iteration may be much slower than using np.setdiff1d..?

Yes np.setdiff1d seems to be faster

due to the use of np.setdiff1d it is not possible to partial_fit with a sparse y

jnothman · 2016-11-05T12:55:14Z

sklearn/tests/test_multiclass.py

+    assert_equal(np.mean(pred == y), np.mean(pred1 == y))
+
+    # Test when mini batches doesn't have all classes
+    # with SGDClassifier


As a reader of the test, it's not clear why you would want to test with both these base estimators.

Also: can we do it in a loop for clarity that they're the same test?

In the previous PR #6239 , the author of the PR was able to create a fix which worked with MultinomialNB but not with SGDClassifier. That's why I added SGDClassifier also. But i think only testing with SGDClassifier is enough, @jnothman ?

Yes, happy for it to just be SGDClassifier

jnothman · 2016-11-05T12:56:06Z

sklearn/tests/test_multiclass.py

+    # A new class value which was not in the first call of partial_fit
+    # It should raise ValueError
+    y1 = [5] + y[7:-1]
+    assert_raises(ValueError, ovr.partial_fit, X=X[7:], y=y1)


but then you can drop this line.

jnothman · 2016-11-05T21:27:47Z

Thanks, @srivatsan-ramesh

jnothman · 2016-11-05T21:28:21Z

Sorry, I forgot to have you write an entry in doc/whats_new.rst Could you please submit a quick PR describing the fix?

srivatsan-ramesh · 2016-11-06T07:01:27Z

@jnothman Created a PR!

* mini-batch can now contain less number of classes than actual data * added tests where mini batches doesn't contain all classes

elcombato · 2016-11-14T16:34:11Z

Due to np.setdiff1d in this line of partial_fit, it is not possible to pass a sparse matrix for y.

Both the commit ed08d38 by @srivatsan-ramesh and the reviewed version by @jnothman 5fd925a throw an error with a sparse y.

Or is there an alternative, if I use partial_fit to perform multilabel classification and pass a sparse matrix of y which is binarized with MultiLabelBinarizer?

if y is sparse check with:

if np.setdiff1d(y.indices, self.classes_)

lesteve · 2016-11-15T10:42:07Z

Hmmm the docstring does say that sparse matrices are accepted for y. I am not sure what the scikit-learn convention is in general about whether a sparse y should be accepted in fit and similar functions.

I guess it is not really worth using a sparse matrix for y in general. A work-around is to do y = y.todense() before calling partial_fit.

jnothman · 2016-11-15T11:17:06Z

sparse y may be used to represent multilabel problems. It could be accepted
wherever multilabel inputs are, in theory.

On 15 November 2016 at 21:42, Loïc Estève notifications@github.com wrote:

Hmmm the docstring does say that sparse matrices are accepted for y. I am
not sure what the scikit-learn convention is in general about whether a
sparse y should be accepted in fit and similar functions.

I guess it is not really worth using a sparse matrix for y in general. A
work-around is to do y = y.todense() before calling partial_fit.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7786 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz66v7s3jkkIhDLEKLSTkjoaDXpj6_ks5q-YyAgaJpZM4KkLcl
.

lesteve · 2016-11-15T13:29:00Z

sparse y may be used to represent multilabel problems. It could be accepted wherever multilabel inputs are, in theory.

I see, it does look like we are missing tests for this then ... it would be good to do something similar to #7590 for multilabel problems, i.e. test that sparse and dense y are giving results that are close to each other.

* mini-batch can now contain less number of classes than actual data * added tests where mini batches doesn't contain all classes

srivatsan-ramesh added 2 commits October 29, 2016 22:57

mini-batch can now contain less number of classes than actual data

ed08d38

added tests where mini batches doesn't contain all classes

cd527ee

srivatsan-ramesh mentioned this pull request Oct 29, 2016

Fix for OvR partial_fit in various edge cases #6239

Closed

3 tasks

srivatsan-ramesh changed the title ~~Fix for OvR partial_fit bug~~ [MRG] Fix for OvR partial_fit bug Oct 29, 2016

reducing line lengths

6f27790

amueller added this to the 0.19 milestone Nov 1, 2016

amueller added the Bug label Nov 1, 2016

amueller modified the milestones: 0.18.1, 0.19 Nov 2, 2016

amueller changed the title ~~[MRG] Fix for OvR partial_fit bug~~ [MRG + 1] Fix for OvR partial_fit bug Nov 2, 2016

lesteve reviewed Nov 4, 2016

View reviewed changes

assert_raises_regex check is added

c428554

jnothman requested changes Nov 5, 2016

View reviewed changes

removed unwanted tests and assert stmnts

5fd925a

jnothman approved these changes Nov 5, 2016

View reviewed changes

jnothman merged commit 9fd70a8 into scikit-learn:master Nov 5, 2016

srivatsan-ramesh deleted the dev branch November 6, 2016 06:34

srivatsan-ramesh added a commit to srivatsan-ramesh/scikit-learn that referenced this pull request Nov 6, 2016

added bug fix scikit-learn#7786 to whats_new

ff837c0

srivatsan-ramesh mentioned this pull request Nov 6, 2016

Added bug fix #7786 to whats_new.rst #7830

Merged

jnothman pushed a commit that referenced this pull request Nov 6, 2016

DOC Added bug fix #7786 to whats_new.rst (#7830)

2e4b589

amueller pushed a commit to amueller/scikit-learn that referenced this pull request Nov 9, 2016

[MRG+2] Fix for OvR partial_fit bug (scikit-learn#7786)

6c385bd

* mini-batch can now contain less number of classes than actual data * added tests where mini batches doesn't contain all classes

amueller pushed a commit to amueller/scikit-learn that referenced this pull request Nov 9, 2016

DOC Added bug fix scikit-learn#7786 to whats_new.rst (scikit-learn#7830)

1858693

lesteve mentioned this pull request Nov 16, 2016

Test support for sparse multilabel format #7886

Open

sergeyf pushed a commit to sergeyf/scikit-learn that referenced this pull request Feb 28, 2017

[MRG+2] Fix for OvR partial_fit bug (scikit-learn#7786)

b308398

* mini-batch can now contain less number of classes than actual data * added tests where mini batches doesn't contain all classes

sergeyf pushed a commit to sergeyf/scikit-learn that referenced this pull request Feb 28, 2017

DOC Added bug fix scikit-learn#7786 to whats_new.rst (scikit-learn#7830)

e1c0e9a

Sundrique pushed a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017

[MRG+2] Fix for OvR partial_fit bug (scikit-learn#7786)

e9e750d

* mini-batch can now contain less number of classes than actual data * added tests where mini batches doesn't contain all classes

Sundrique pushed a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017

DOC Added bug fix scikit-learn#7786 to whats_new.rst (scikit-learn#7830)

b369a51

NelleV pushed a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017

[MRG+2] Fix for OvR partial_fit bug (scikit-learn#7786)

6d7fc75

* mini-batch can now contain less number of classes than actual data * added tests where mini batches doesn't contain all classes

NelleV pushed a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017

DOC Added bug fix scikit-learn#7786 to whats_new.rst (scikit-learn#7830)

1fe20bd

paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

[MRG+2] Fix for OvR partial_fit bug (scikit-learn#7786)

0a8d6a8

* mini-batch can now contain less number of classes than actual data * added tests where mini batches doesn't contain all classes

paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

DOC Added bug fix scikit-learn#7786 to whats_new.rst (scikit-learn#7830)

e7a8705

maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

DOC Added bug fix scikit-learn#7786 to whats_new.rst (scikit-learn#7830)

4aceb23

Uh oh!

Conversation

srivatsan-ramesh commented Oct 29, 2016

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

amueller commented Nov 2, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

raghavrv commented Nov 5, 2016

Uh oh!

srivatsan-ramesh commented Nov 5, 2016

Uh oh!

raghavrv commented Nov 5, 2016

Uh oh!

srivatsan-ramesh commented Nov 5, 2016

Uh oh!

srivatsan-ramesh commented Nov 5, 2016

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman commented Nov 5, 2016

Uh oh!

jnothman commented Nov 5, 2016

Uh oh!

srivatsan-ramesh commented Nov 6, 2016

Uh oh!

elcombato commented Nov 14, 2016

Uh oh!

lesteve commented Nov 15, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Nov 15, 2016

Uh oh!

lesteve commented Nov 15, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

lesteve commented Nov 15, 2016 •

edited

Loading

lesteve commented Nov 15, 2016 •

edited

Loading