[MRG+1] Patch liblinear for sample_weights in LogisticRegression(and CV) by MechCoder · Pull Request #5274 · scikit-learn/scikit-learn

MechCoder · 2015-09-15T20:55:10Z

This add sample_weights to the liblinear solver for LogisticRegression and LogisticRegressionCV. It had been already added to the other solvers in another PR

MechCoder · 2015-09-15T22:03:38Z

ping @vstolbunov . Also @fabianp could you have a look, since you know this part of the code very well.

MechCoder · 2015-09-16T19:15:34Z

tests pass now.

vstolbunov · 2015-09-16T19:19:08Z

I took a look last night and they had passed so I wasn't sure what the problem was?

MechCoder · 2015-09-16T19:38:56Z

I think I forgot to build it locally and hence I got some segmentation faults. But now it's all right

ogrisel · 2015-09-23T19:23:49Z

@TomDLT you might want to review this.

TomDLT · 2015-09-24T08:20:50Z

sklearn/linear_model/tests/test_logistic.py

{0: 1, 1: 2}

TomDLT · 2015-09-24T08:27:39Z

You should also compare the results with liblinear and other solvers, like in this test.

And testing it, it reveals that liblinear does not handles sample_weights with integers.
The problem is that this check is not done when liblinear skip everything here.

TomDLT · 2015-09-24T08:37:52Z

sklearn/linear_model/tests/test_logistic.py

line too long

TomDLT · 2015-09-24T08:47:24Z

It looks pretty good to me. (yet I am not a C++ master)

MechCoder · 2015-09-24T17:57:42Z

@TomDLT thanks for your reviews. I've fixed it up.

Any second reviewers?
cc @jnothman @ogrisel

MechCoder · 2015-09-24T17:58:21Z

@TomDLT Can you update the PR to MRG+1 if you are happy?

(Btw, I don't know what your definition of a C++ master is, but whatever it is I'm not one either :P)

ogrisel · 2015-10-19T14:12:42Z

If you are looking for a C++ master, @larsmans is a good candidate :)

ogrisel · 2015-10-19T14:24:08Z

sklearn/svm/base.py

PEP8, put the check_consistent_length import on its own line.

jnothman · 2015-10-20T01:09:33Z

You should also be supporting in LinearSVC.

MechCoder · 2015-10-20T01:09:42Z

I thought that was for LinearSVC

jnothman · 2015-10-20T01:10:33Z

I thought that was for LinearSVC

I'd thought it was a general patch to liblinear, but maybe..

MechCoder · 2015-10-20T17:00:56Z

@jnothman So we'll merge this and add support for other solvers later?

TomDLT · 2015-10-21T15:59:14Z

@ogrisel @larsmans @jnothman what are your views on this PR?

MechCoder · 2015-10-23T04:08:11Z

Rebased. Would be great if someone can give a final +1

agramfort · 2015-10-23T08:23:32Z

played with this a bit and it worked great. Merging.

[MRG+1] Patch liblinear for sample_weights in LogisticRegression(and CV)

amueller · 2015-10-23T14:25:46Z

the pyx file doesn't compile with cython 0.21, 0.22 or 0.23. You used 0.20, I'll try that next. I'm pretty scared of the casting that is going on there. This was found by @arthurmensch in #5557

amueller · 2015-10-23T14:28:14Z

Installed 0.20, still doesn't compile. liblinear.pyx:39:35: Cannot assign type 'char *' to 'double *' did you forget to call cython?

MechCoder · 2015-10-23T14:56:36Z

I think I just forgot to push the generated C files. Just a second.

niteshroyal · 2018-06-09T19:40:16Z

How objective function changes in the case of sample_weight for Logistic Regression? Can you please provide the mathematical expression?

I assume objective function changes like this

E(\mathbf{w}) = - \sum_{n=1}^{N} {s_n t_n \ln y_n + (1-s_n t_n) \ln(1-y_n)}

where s_n is the sample_weight of nth sample.

The above equation modified according to equation 4.90 of Christopher Bishop's PRML book.

Clarification: The equation is written in Latex. Could not post image

amueller · 2018-06-09T19:50:01Z

@niteshroyal this is not the right place to ask usage questions, see http://scikit-learn.org/dev/faq.html#what-s-the-best-way-to-get-help-on-scikit-learn-usage

jnothman · 2018-06-09T21:03:08Z

ordinarily, weighting means solving an objective that is equivalent to having the samples repeated in proportion to their weight

memeplex · 2018-08-16T17:51:50Z

ordinarily, weighting means solving an objective that is equivalent to having the samples repeated in proportion to their weight

@jnothman I have seen that when the class weights are too imbalanced adding more degrees of freedom to the model won't always result in a lower (accordingly weighted) log_loss. So I suspect that, except for a numerical issue that I'm unaware of, the objective function is not exactly the same than the one log_loss is evaluating (equivalent to "samples repeated in proportion to their weight"). Of course, this isn't a generalization problem, the loss was computed over the training set. The parameter C is set to 1e30 so that regularization is virtually disabled. If this is an unexpected behavior (it is for me! I could provide the data and model to reproduce the behavior.

memeplex · 2018-08-16T18:35:57Z

Ok, I think it was a numerical issue indeed, playing with the tol parameter I managed to get a decreasing loss schedule for an increasing degrees of freedom one. This is for a very low rate of conversion dataset (p < 1e-4) so the convergence criterion is critical.

MechCoder force-pushed the liblinear_samples_weights branch 2 times, most recently from 0aff304 to a700d3c Compare September 15, 2015 21:59

MechCoder changed the title ~~[WIP] Patch liblinear for sample_weights in LogisticRegression(and CV)~~ [MRG] Patch liblinear for sample_weights in LogisticRegression(and CV) Sep 15, 2015

MechCoder force-pushed the liblinear_samples_weights branch from a700d3c to 2ef06ac Compare September 15, 2015 22:02

MechCoder force-pushed the liblinear_samples_weights branch from 2ef06ac to 268b1bc Compare September 15, 2015 22:06

TomDLT reviewed Sep 24, 2015
View reviewed changes

sklearn/linear_model/tests/test_logistic.py Outdated

Copy link
Copy Markdown

Member

TomDLT Sep 24, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{0: 1, 1: 2}

TomDLT reviewed Sep 24, 2015
View reviewed changes

sklearn/linear_model/tests/test_logistic.py Outdated

Copy link
Copy Markdown

Member

TomDLT Sep 24, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long

MechCoder force-pushed the liblinear_samples_weights branch from 6e6ec68 to 867596b Compare September 24, 2015 17:56

MechCoder force-pushed the liblinear_samples_weights branch from 867596b to 5134286 Compare September 24, 2015 18:02

TomDLT changed the title ~~[MRG] Patch liblinear for sample_weights in LogisticRegression(and CV)~~ [MRG+1] Patch liblinear for sample_weights in LogisticRegression(and CV) Sep 25, 2015

MechCoder mentioned this pull request Oct 18, 2015

[MRG+1] fix logistic regression class weights #5008

Merged

ogrisel reviewed Oct 19, 2015
View reviewed changes

sklearn/svm/base.py Outdated

Copy link
Copy Markdown

Member

ogrisel Oct 19, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8, put the check_consistent_length import on its own line.

MechCoder force-pushed the liblinear_samples_weights branch 2 times, most recently from 75758f6 to 84668a7 Compare October 19, 2015 19:05

MechCoder force-pushed the liblinear_samples_weights branch from b11922c to c01faae Compare October 20, 2015 14:05

MechCoder added 3 commits October 23, 2015 00:00

Patch liblinear for sample_weights in LogisticRegression(and CV)

80e22b3

Add check for sample_weights

41cbfde

Replace GETC to SAMPLE_WEIGHT

9d8d7e4

MechCoder force-pushed the liblinear_samples_weights branch from c01faae to 9d8d7e4 Compare October 23, 2015 04:07

agramfort added a commit that referenced this pull request Oct 23, 2015

Merge pull request #5274 from MechCoder/liblinear_samples_weights

1c5d6d7

[MRG+1] Patch liblinear for sample_weights in LogisticRegression(and CV)

agramfort merged commit 1c5d6d7 into scikit-learn:master Oct 23, 2015

MechCoder mentioned this pull request Oct 23, 2015

[MRG+1] HOTFIX for incorrect cast in liblinear.pyx #5565

Merged

TomDLT mentioned this pull request Oct 26, 2015

App_veyor failure on master, in test_logistic_regression_sample_weights #5598

Closed

MechCoder deleted the liblinear_samples_weights branch December 8, 2015 21:03

TomDLT mentioned this pull request Feb 26, 2016

[MRG] Support multi-threading of LibLinear L1 one-vs-rest LogisticRegression for # classes > 2 #6448

Closed

TomDLT mentioned this pull request May 24, 2016

[MRG+1] use class_weight through sample_weight in LogisticRegression with liblinear #6817

Merged

TomDLT mentioned this pull request Jun 6, 2016

sample_weight in LinearSVR .fit #6862

Closed

jnothman mentioned this pull request Jun 23, 2016

Liblinear Sample Weights #2784

Closed

jnothman mentioned this pull request Mar 26, 2018

LinearSVC ignores sample weights #10873

Closed

Uh oh!

Conversation

MechCoder commented Sep 15, 2015

Uh oh!

MechCoder commented Sep 15, 2015

Uh oh!

MechCoder commented Sep 16, 2015

Uh oh!

vstolbunov commented Sep 16, 2015

Uh oh!

MechCoder commented Sep 16, 2015

Uh oh!

ogrisel commented Sep 23, 2015

Uh oh!

TomDLT Sep 24, 2015

Choose a reason for hiding this comment

Uh oh!

TomDLT commented Sep 24, 2015

Uh oh!

TomDLT Sep 24, 2015

Choose a reason for hiding this comment

Uh oh!

TomDLT commented Sep 24, 2015

Uh oh!

MechCoder commented Sep 24, 2015

Uh oh!

MechCoder commented Sep 24, 2015

Uh oh!

ogrisel commented Oct 19, 2015

Uh oh!

ogrisel Oct 19, 2015

Choose a reason for hiding this comment

Uh oh!

jnothman commented Oct 20, 2015

Uh oh!

MechCoder commented Oct 20, 2015

Uh oh!

jnothman commented Oct 20, 2015

Uh oh!

MechCoder commented Oct 20, 2015

Uh oh!

TomDLT commented Oct 21, 2015

Uh oh!

MechCoder commented Oct 23, 2015

Uh oh!

agramfort commented Oct 23, 2015

Uh oh!

amueller commented Oct 23, 2015

Uh oh!

amueller commented Oct 23, 2015

Uh oh!

MechCoder commented Oct 23, 2015

Uh oh!

niteshroyal commented Jun 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amueller commented Jun 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Jun 9, 2018 via email

Uh oh!

memeplex commented Aug 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

memeplex commented Aug 16, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

niteshroyal commented Jun 9, 2018 •

edited

Loading

amueller commented Jun 9, 2018 •

edited

Loading

memeplex commented Aug 16, 2018 •

edited

Loading