[WIP] Common test for sample weight#5461
[WIP] Common test for sample weight#5461eickenberg wants to merge 2 commits intoscikit-learn:masterfrom
Conversation
…ck whether test should apply to all of the estimators
There was a problem hiding this comment.
you can use in utils.validation.has_fit_parameter
|
Oops just realized this is a |
|
for SGD it could be a convergence issue. |
|
Actually, sample weight support in scorers is fine. |
|
For random forest with For methods using randomization, you should make sure to use the same Finally, because of numerical issues, it may happen that trees differ slightly, though the top parts should be consistent. |
There was a problem hiding this comment.
random_state should be enforced here, if it is a parameter of Estimator.
There was a problem hiding this comment.
there is the set_random_state helper for that.
|
whoops, I was confused about the bootstrapping. Obviously it's different. |
|
Interestingly |
|
Oh, I had assumed the random state was fixed... |
This is because Discussing with @arjoly and @agramfort yesterday, the random nature of the subsampling causes the augmented data to be split in ways that are impossible with |
Oh, sorry, @glouppe, you had already mentioned that. |
|
So the question becomes:
Any opinions on this? |
|
Well you definitely have to fix the random state. Otherwise even the same parameters won't produce the same results ;) |
|
OK. @ainafp is taking over this PR. |
|
superseded by #5515 |
Trying to address #5444
I wrote a common test that goes through all classifiers and regressors and checks whether sample weights correspond to data augmentation. If the
coef_attribute is available, then it is also compared.The estimators that fail this test are the following