[MRG] FIX sample_weight invariance for linear models#19616
[MRG] FIX sample_weight invariance for linear models#19616rth merged 14 commits intoscikit-learn:mainfrom
Conversation
|
I pushed more tests to highlight all the remaining problems. In particular I added I would need more time to write the weighted objective functions for all models ( But feel free to give it a try without waiting for me. |
@ogrisel As opposed to benchmarking, there is no dark margic involved in this: This objective function of Enet ist invariant to rescaling of sample weights. We use this freedom: By a lucky hunch we end up with |
|
@lorentzenchr do you have a bit of bandwidth to take over here? |
I'll try. |
|
This whole |
|
@ogrisel @agramfort What do you think? Does this solve the issue? Or is 599a63f a show stopper for you? I think it's ok and we want to get rid of |
agramfort
left a comment
There was a problem hiding this comment.
thx @lorentzenchr for cleaning up this mess 🙏
|
@ogrisel Are you fine with the current status? If so let's count each of our reviews as +1/2 which adds up to +1 => merge:smirk: |
lorentzenchr
left a comment
There was a problem hiding this comment.
LGTM (may count as a half review:smirk:)
rth
left a comment
There was a problem hiding this comment.
Thanks a lot @ogrisel @lorentzenchr !
Sorry for the lack of reply. I guess I unconsciously ignored the notifications related to this cursed PR. We probably still have bugs with |
This is a tentative fix for #17444 but doing so breaks a test added as part of #19426.
The new normalize in base model uses
sample_weight.sum()instead ofX.shape[0]to scaleX_var. This effectively fixes the problem of #17444 as can be seen in the new test:sklearn.linear_model/tests/test_ridge.py::test_sample_weight_invariance.However that requires updating the
_scale_alpha_inplaceto maketest_linear_model_sample_weights_normalize_in_pipelinepass for ridge.However now we get this test that fails for Lasso / ElasticNet. I suspect this is because of this dark magic:
scikit-learn/sklearn/linear_model/_coordinate_descent.py
Lines 795 to 807 in 1045d16
but I am not sure what is the correct fix.
/cc @maikia @agramfort @rth.