add common test that zero sample weight means samples are ignored by amueller · Pull Request #15015 · scikit-learn/scikit-learn

amueller · 2019-09-18T18:31:51Z

Adding a test following #10873.
This is a pretty simple sample_weight test that says that a weight of 0 is equivalent to not having the samples.
I think every failure here should be considered a bug. This is:

I wonder if for the SGD based algorithms there's an issue where we shrink w if we see a sample with sample weight 0.

amueller · 2019-09-18T19:26:11Z

added decision_function and predict_proba, now also SGDClassifier, RidgeClassifierCV, LogisticRegressionCV, LinearSVC, NuSVC and SVC are failing.

For the CV ones, I guess the splits are different because the CV is influenced by the zero sample weights. That's a bit weird but not sure what the expected behavior should be?

glemaitre · 2019-09-19T07:53:59Z

It is the same issues that I found in #14246 and #14266
I also found some issues in the ensemble (but this is only due to the bootstrapping).

glemaitre · 2019-09-19T07:58:19Z

All SVM family should be fixed with #14286. Waiting for a second review there.

Edit: Actually this is only solving issue with negative weights.

glemaitre · 2019-09-19T16:04:34Z

sklearn/utils/estimator_checks.py

+        y2 = _enforce_estimator_tags_y(estimator3, y2)
+        weights = np.ones(shape=len(y) * 2)
+        weights[len(y):] = 0
+        X2, y2, weights = shuffle(X2, y2, weights, random_state=0)


Shuffling the data might have an impact, isn't it.
I was checking and OneClassSVM will give the same result if you are not shuffling the data.

Yes indeed. Your test is better wrt to that, I think. I didn't want things to be sorted by weight (CV estimators will fail) but I guess I would need to ensure the order of the non-zero-weight samples stays the same (which is I think what you did?)

Also the default tol is too high to compare the decision function. I just tried with SVC with a tol=1e-12 then the decision function will be really close.

adrinjalali · 2019-10-02T14:16:50Z

For the CV ones, I guess the splits are different because the CV is influenced by the zero sample weights. That's a bit weird but not sure what the expected behavior should be?

I'd say splitters should take the weights into account for shuffling/splitting the data. If the data has a lot of samples with 0 weight, not considering it would mean some splits may have samples all with sample_weight=0. Also, in terms of a shuffle k split, it does make sense (to me) to have a similar weight across the splits.

That said, I'm not sure if it's a reasonable expectation to have the same set of non-zero-weight samples after introducing a bunch of zero-weighed samples to the data.

amueller · 2019-10-05T12:40:02Z

I'd say splitters should take the weights into account for shuffling/splitting the data.

I agree with your reasoning. However, that's a behavior change and also an API change. Right now, the signature of split is (X, y=None, groups=None).

It might also be tricky to define what the desired behavior is. So let's say for KFold we want the sum of the weights to be the same for each fold (instead of the number of samples). Do we want this to be as stable as possible or do we want to make sure they also have similar amounts of samples?
No matter what, this is probably NP complete (this sounds equivalent to binpacking?) but we could maybe do a "relax and round" solution.

What does it mean for StratifiedKFold? Right now we want the best stratification and also now guarantee that the number of samples off by at most one across folds. If we do this weighted, does that mean we ensure the weighted count of samples per class is the same per fold, and also the weighted total sum?
Is that an LP or just a least squares problem [relaxed, the integer version is NP again, I would guess?].
And do we want to take the real sample count into account as well?
(and then if we want to stratify and have groups, what do you do?)

Now suppose we do that. I assume the case with all equal sample weights (all ones) will result in a degenerate relaxed solution, and so the rounding will be tricky. In that case we can't recover the original unweighted solution from the all ones case :-/

That possibly means that we need to come up with a greedy heuristic (as we did in other cases) that reduces to the unweighted case more easily.

And finally, as you said, we need to define our expectations. We already give different results after shuffling the data in some cases (should this be a tag? hum... for randomized algorithms this is a bit tricky to define).

Do we want to say that if we replace a sample with two identical samples with half the weight the results should be the same? That's probably not mathematically true for SGD, right?

Right now I think the easiest solution might be having a tag that says that an estimator depends on the order of the data.
This is a bit weird in the case of KMeans(init='random') where reordering the data is equivalent to changing the random state and I don't know how I would define the tag there.

ogrisel · 2019-10-07T09:55:45Z

I am not sure we want to go down the way of sample_weight-aware CV splitters. That sounds awfully complicated to get right to me.

…test

ogrisel · 2019-10-09T08:37:06Z

Merged master to trigger a CI run with the result of #14286. I also update the description to be able to tick estimators and track progress.

ogrisel · 2019-10-09T09:36:57Z

#14286 alone is not enough to fix the common test for models in the SVM family. Maybe a stricter tol would also be required.

glemaitre

From my previous try, we needed to reduce the tolerance for the SVM classifier.
#15038 should also be merged (It should not be an issue here since they are not the default parameters).

rth · 2019-11-07T19:04:15Z

How about marking failing estimators with xfail, opening a follow up issue and merging this?

It would be helpful to have this in master (otherwise one have to checkout this branch, and localy merge master to do anything about this). While if it was in master, one could simply run it with pytest sklearn -k .. --runxfail.

rth · 2020-02-20T23:18:01Z

I continued this PR in #16507 as suggested above which would allow to merge this test to master with some estimators marked as a known failure.

amueller added 2 commits September 18, 2019 14:30

add test that zero sample weight means samples are ignored

85dc97e

add predict_proba and decision_function to tests

95a2d5b

amueller mentioned this pull request Sep 19, 2019

Sample weights in LinearSVC L2 squared hinge primal liblinear solver #15018

Closed

amueller added this to the 0.22 milestone Sep 19, 2019

amueller mentioned this pull request Sep 19, 2019

[WIP] FIX make sure sample_weight is taken into account by estimators #14246

Closed

glemaitre reviewed Sep 19, 2019

View reviewed changes

glemaitre self-requested a review September 19, 2019 16:07

Merge remote-tracking branch 'origin/master' into sample_weight_real_…

bc285fa

…test

glemaitre reviewed Oct 22, 2019

View reviewed changes

jnothman modified the milestones: 0.22, 0.23 Oct 31, 2019

rth mentioned this pull request Nov 7, 2019

TST Add tests for LinearRegression that sample weights act consistently #15554

Merged

rth mentioned this pull request Nov 19, 2019

RFC Sample weight invariance properties #15657

Open

This was referenced Jan 29, 2020

List of estimators with known incorrect handling of sample_weight #16298

Open

ENH Support for XFAIL/XPASS in common tests #16306

Closed

rth mentioned this pull request Feb 20, 2020

Common check for sample weight invariance with removed samples #16507

Merged

github-actions bot added the module:utils label Mar 2, 2020

thomasjpfan modified the milestones: 0.23, 0.24 Apr 20, 2020

rth closed this in #16507 May 10, 2020

Uh oh!

Conversation

amueller commented Sep 18, 2019 • edited by ogrisel Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amueller commented Sep 18, 2019

Uh oh!

glemaitre commented Sep 19, 2019

Uh oh!

glemaitre commented Sep 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre Sep 19, 2019

Choose a reason for hiding this comment

Uh oh!

amueller Sep 19, 2019

Choose a reason for hiding this comment

Uh oh!

glemaitre Sep 19, 2019

Choose a reason for hiding this comment

Uh oh!

adrinjalali commented Oct 2, 2019

Uh oh!

amueller commented Oct 5, 2019

Uh oh!

ogrisel commented Oct 7, 2019

Uh oh!

ogrisel commented Oct 9, 2019

Uh oh!

ogrisel commented Oct 9, 2019

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

rth commented Nov 7, 2019

Uh oh!

rth commented Feb 20, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

amueller commented Sep 18, 2019 •

edited by ogrisel

Loading

glemaitre commented Sep 19, 2019 •

edited

Loading