add common test that zero sample weight means samples are ignored#15015
add common test that zero sample weight means samples are ignored#15015amueller wants to merge 3 commits intoscikit-learn:masterfrom
Conversation
|
added decision_function and predict_proba, now also For the CV ones, I guess the splits are different because the CV is influenced by the zero sample weights. That's a bit weird but not sure what the expected behavior should be? |
|
All SVM family should be fixed with #14286. Waiting for a second review there. Edit: Actually this is only solving issue with negative weights. |
| y2 = _enforce_estimator_tags_y(estimator3, y2) | ||
| weights = np.ones(shape=len(y) * 2) | ||
| weights[len(y):] = 0 | ||
| X2, y2, weights = shuffle(X2, y2, weights, random_state=0) |
There was a problem hiding this comment.
Shuffling the data might have an impact, isn't it.
I was checking and OneClassSVM will give the same result if you are not shuffling the data.
There was a problem hiding this comment.
Yes indeed. Your test is better wrt to that, I think. I didn't want things to be sorted by weight (CV estimators will fail) but I guess I would need to ensure the order of the non-zero-weight samples stays the same (which is I think what you did?)
There was a problem hiding this comment.
Also the default tol is too high to compare the decision function. I just tried with SVC with a tol=1e-12 then the decision function will be really close.
I'd say splitters should take the weights into account for shuffling/splitting the data. If the data has a lot of samples with 0 weight, not considering it would mean some splits may have samples all with sample_weight=0. Also, in terms of a shuffle k split, it does make sense (to me) to have a similar weight across the splits. That said, I'm not sure if it's a reasonable expectation to have the same set of non-zero-weight samples after introducing a bunch of zero-weighed samples to the data. |
I agree with your reasoning. However, that's a behavior change and also an API change. Right now, the signature of It might also be tricky to define what the desired behavior is. So let's say for KFold we want the sum of the weights to be the same for each fold (instead of the number of samples). Do we want this to be as stable as possible or do we want to make sure they also have similar amounts of samples? What does it mean for Now suppose we do that. I assume the case with all equal sample weights (all ones) will result in a degenerate relaxed solution, and so the rounding will be tricky. In that case we can't recover the original unweighted solution from the all ones case :-/ That possibly means that we need to come up with a greedy heuristic (as we did in other cases) that reduces to the unweighted case more easily. And finally, as you said, we need to define our expectations. We already give different results after shuffling the data in some cases (should this be a tag? hum... for randomized algorithms this is a bit tricky to define). Do we want to say that if we replace a sample with two identical samples with half the weight the results should be the same? That's probably not mathematically true for SGD, right? Right now I think the easiest solution might be having a tag that says that an estimator depends on the order of the data. |
|
I am not sure we want to go down the way of |
|
Merged master to trigger a CI run with the result of #14286. I also update the description to be able to tick estimators and track progress. |
|
#14286 alone is not enough to fix the common test for models in the SVM family. Maybe a stricter tol would also be required. |
|
How about marking failing estimators with xfail, opening a follow up issue and merging this? It would be helpful to have this in master (otherwise one have to checkout this branch, and localy merge master to do anything about this). While if it was in master, one could simply run it with |
|
I continued this PR in #16507 as suggested above which would allow to merge this test to master with some estimators marked as a known failure. |
Adding a test following #10873.
This is a pretty simple sample_weight test that says that a weight of 0 is equivalent to not having the samples.
I think every failure here should be considered a bug. This is:
I wonder if for the SGD based algorithms there's an issue where we shrink
wif we see a sample with sample weight 0.