[MRG+1] Add sample_weight support to RidgeClassifier by trevorstephens · Pull Request #4838 · scikit-learn/scikit-learn

trevorstephens · 2015-06-09T04:38:05Z

Added sample_weight support to RidgeClassifier as it was in RidgeClassifierCV but not the non-CV implementation.

Also added some tests to both of the above to check it reacts as expected when compared to class_weight from the constructor.

TomDLT · 2015-06-09T08:22:29Z

looks good

arjoly · 2015-06-09T08:30:07Z

probably worth looking at #4490

agramfort · 2015-06-09T08:30:26Z

LGTM

trevorstephens · 2015-06-09T14:44:38Z

Thanks @arjoly , #4490 appears to only affect RidgeClassifierCV though, so wouldn't affect this PR.

amueller · 2015-06-09T15:22:48Z

thanks.

[MRG+1] Add sample_weight support to RidgeClassifier

trevorstephens · 2015-06-09T21:47:49Z

Thanks for the reviews guys !

jnothman · 2015-06-10T04:09:28Z

sklearn/linear_model/tests/test_ridge.py

This is a nice test, but is more appropriate as a common test than one specific to Ridge, and I think it's time for there to be common tests regarding sample weight. They would best rely on:

the ability to distinguish an estimator which accepts sample_weight from one which does not.

something like assert_same_model from [WIP] Adding tests for estimators implementing partial_fit and a few other related fixes / enhancements #3907

Agreed @jnothman , nothing special to Ridge here, purely a test for classifiers with class_weight (constructor) and sample_weight (fit) to check equivalency and multiplicative combination. Should be able to work its way into a common test I would think. This test appears in the tree tests (though with a multi-output additional step_ and elsewhere (I think I may have pilfered or altered it from SGD originally).

I'd wondered whether it was copied from elsewhere. That's upsetting.
Perhaps we need an issue to track tests that should be factored into
common...

On 10 June 2015 at 14:28, Trevor Stephens notifications@github.com wrote:

In sklearn/linear_model/tests/test_ridge.py
#4838 (comment)
:

@@ -487,6 +487,35 @@ def test_class_weights():
assert_array_almost_equal(clf.intercept_, clfa.intercept_)

+def test_class_weight_vs_sample_weight():

Agreed @jnothman https://github.com/jnothman , nothing special to Ridge
here, purely a test for classifiers with class_weight (constructor) and
sample_weight (fit) to check equivalency and multiplicative combination.
Should be able to work its way into a common test I would think. This test
appears in the tree tests (though with a multi-output additional step_ and
elsewhere (I think I may have pilfered or altered it from SGD originally).

—
Reply to this email directly or view it on GitHub
https://github.com/scikit-learn/scikit-learn/pull/4838/files#r32087333.

Upsetting how? That it's propagating outside of common tests?

Best I can trace right now was @dsullivan7 's #3931 (somewhat similar test on multiplicative weights), then RF & D-Tree in #3961 .. Now here in Ridge. Cannot recall now the exact lineage I used from the RF/tree PR tests. Don't see any duplicated inline comments outside of RF/Tree/Ridge on git FWIW.

Note that in tree-based classifier's I tested for equivalency in feature_importances_, and here it's coef_.

Something similar in #4215 coming soon enough too, I imagine, for the rest of the ensemble tests.

Note that in tree-based classifier's I tested for equivalency in
feature_importances_, and here it's coef_.

Hence the use of assert_same_model

On 10 June 2015 at 14:51, Joel Nothman joel.nothman@gmail.com wrote:

Upsetting that we're duplicating code and then going to need to rein it in.

On 10 June 2015 at 14:49, Trevor Stephens notifications@github.com
wrote:

In sklearn/linear_model/tests/test_ridge.py
#4838 (comment)
:

@@ -487,6 +487,35 @@ def test_class_weights():
assert_array_almost_equal(clf.intercept_, clfa.intercept_)

+def test_class_weight_vs_sample_weight():

Upsetting how? That it's propagating outside of common tests?

Best I can trace right now was @dsullivan7
https://github.com/dsullivan7 's #3931
#3931 (somewhat
similar test on multiplicative weights), then RF & D-Tree in #3961
#3961 .. Now here in
Ridge. Cannot recall now the exact lineage I used from the RF/tree PR
tests. Don't see any duplicated inline comments outside of RF/Tree/Ridge on
git FWIW.

Note that in tree-based classifier's I tested for equivalency in
feature_importances_, and here it's coef_.

—
Reply to this email directly or view it on GitHub
https://github.com/scikit-learn/scikit-learn/pull/4838/files#r32087957.

There are already common tests for class_weights, and special ones for linear classifiers: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/estimator_checks.py#L1035 https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/estimator_checks.py#L1087

Adding one that checks if sample_weights is equivalent to doing the corresponding class_weights would be nice, too. I feel it should be pretty straight-forward, though. Why would you need a helper? Just check if decision_function is almost_equal.

True... as long as we don't have any classification-oriented transformers
that are not predictors that accept sample and class weights!

On 11 June 2015 at 03:46, Andreas Mueller notifications@github.com wrote:

In sklearn/linear_model/tests/test_ridge.py
#4838 (comment)
:

@@ -487,6 +487,35 @@ def test_class_weights():
assert_array_almost_equal(clf.intercept_, clfa.intercept_)

+def test_class_weight_vs_sample_weight():

Adding one that checks if sample_weights is equivalent to doing the
corresponding class_weights would be nice, too. I feel it should be
pretty straight-forward, though. Why would you need a helper? Just check if
decision_function is almost_equal.

—
Reply to this email directly or view it on GitHub
https://github.com/scikit-learn/scikit-learn/pull/4838/files#r32145069.

mblondel · 2015-06-10T06:10:37Z

sklearn/linear_model/ridge.py

Could you also modify this line:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/ridge.py#L285
so that has_sw is False if sample_weight == 1.0.

sample_weight is currently implemented with a rescaling of the data. We can avoid copying the data in the case sample_weight == 1.0.

Another option is to check for sample_weight == 1.0 directly in _rescale_data.

sample_weight == 1.0 is semantically equivalent to sample_weight.ndim == 0?

On 10 June 2015 at 16:11, Mathieu Blondel notifications@github.com wrote:

In sklearn/linear_model/ridge.py
#4838 (comment)
:

Returns ------- self : returns an instance of self. """

if sample_weight is None:

sample_weight = 1.

Could you also modify this line:

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/ridge.py#L285
so that has_sw is False if sample_weight == 1.0.

sample_weight is currently implemented with a rescaling of the data. We
can avoid copying the data in the case sample_weight == 1.0.

Another option is to check for sample_weight == 1.0 directly in
_rescale_data.

—
Reply to this email directly or view it on GitHub
https://github.com/scikit-learn/scikit-learn/pull/4838/files#r32090535.

For scalar values other than 1, sample_weight has the same effect as changing C or 1/ alpha. So this is not very useful, we could potentially raise an exception.

I added #4846 to track this issue.

add sample_weight to RidgeClassifier

2f69574

agramfort changed the title ~~[MRG] Add sample_weight support to RidgeClassifier~~ [MRG+1] Add sample_weight support to RidgeClassifier Jun 9, 2015

amueller added a commit that referenced this pull request Jun 9, 2015

Merge pull request #4838 from trevorstephens/ridge_sw

32b2f8e

[MRG+1] Add sample_weight support to RidgeClassifier

amueller merged commit 32b2f8e into scikit-learn:master Jun 9, 2015

trevorstephens deleted the ridge_sw branch June 9, 2015 16:40

jnothman reviewed Jun 10, 2015
View reviewed changes

jnothman mentioned this pull request Jun 10, 2015

[WIP] RidgeGCV with sample weights is broken #4490

Closed

mblondel reviewed Jun 10, 2015
View reviewed changes

mblondel mentioned this pull request Jun 11, 2015

RidgeClassifier triggers data copy #4846

Closed

trevorstephens mentioned this pull request Jul 11, 2015

[MRG + 1] Add class_weight to PA Classifier, remove from PA Regressor #4767

Merged

Uh oh!

Conversation

trevorstephens commented Jun 9, 2015

Uh oh!

TomDLT commented Jun 9, 2015

Uh oh!

arjoly commented Jun 9, 2015

Uh oh!

agramfort commented Jun 9, 2015

Uh oh!

trevorstephens commented Jun 9, 2015

Uh oh!

amueller commented Jun 9, 2015

Uh oh!

trevorstephens commented Jun 9, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants