Fix Ridge sparse + sample_weight + intercept#22899
Fix Ridge sparse + sample_weight + intercept#22899jeremiedbb merged 8 commits intoscikit-learn:mainfrom
Conversation
lorentzenchr
left a comment
There was a problem hiding this comment.
Could you also parametrize test_ridge_sample_weights and test_ridge_sample_weight_invariance for sparse input?
|
|
||
|
|
||
| @pytest.mark.parametrize("solver", ["sparse_cg", "sag"]) | ||
| def test_ridge_sample_weights_dense_sparse(solver, global_random_seed): |
There was a problem hiding this comment.
Could this be merged with test_ridge_fit_intercept_sparse (and maybe also test_ridge_fit_intercept_sparse)? (take the best part of both).
It seems kind of duplicate.
There was a problem hiding this comment.
Right, I merged them into a single test. It allowed to find out that lbfgs also had the same issue. I fixed it as well
|
This is great as it will resolve a long existing, severe bug! |
| @pytest.mark.parametrize("solver", ["sparse_cg", "lbfgs", "auto"]) | ||
| def test_ridge_fit_intercept_sparse(solver): | ||
| @pytest.mark.parametrize("with_sample_weight", [True, False]) | ||
| def test_ridge_fit_intercept_sparse(solver, with_sample_weight, global_random_seed): |
There was a problem hiding this comment.
tested with "all" seeds
|
@agramfort @rth @TomDLT may be interested. |
| For now only sparse_cg and lbfgs can correctly fit an intercept | ||
| with sparse X with default tol and max_iter. | ||
| 'sag' is tested separately in test_ridge_fit_intercept_sparse_sag because it | ||
| requires more iterations and should raise a warning if default max_iter is used. |
There was a problem hiding this comment.
I pushed a new commit to make this true with sample_weight != None. I tested it with "all" global_random_seed.
ogrisel
left a comment
There was a problem hiding this comment.
I had a shallow look at the code changes and it looks ok-ish but I did not check the maths. I just trust the updated tests.
+1 for merge once the CI is green.
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Fixes #15438
Same issue as in LinearRegression: in sparse X_offset needs the sample_weight rescaling.
The issue appears with solver = 'sparse-cg' and solver = 'lbfgs'.
I noticed that there's also an issue for 'sag' solver, but I'm not familiar with the code at all. Any help would be greatly appreciated. We can also leave that for a separate PR.ping @lorentzenchr