FIX switch to 'sparse_cg' solver in Ridge when X is sparse and fitting intercept#13995
Conversation
sklearn/linear_model/ridge.py
Outdated
| solver = 'sparse_cg' | ||
| if self.solver not in ['auto', 'sparse_cg']: | ||
| warnings.warn( | ||
| 'setting solver to "sparse_cg" because X is sparse') |
There was a problem hiding this comment.
Better,
"solver={} does not support fitting the intercept on sparse data, "
"falling back to solver='sparse_cg'. To avoid this warning either change the solver "
"to 'sparse_cg' explicitly or set `fit_intercept=False`.
sklearn/linear_model/ridge.py
Outdated
| multi_output=True, y_numeric=True) | ||
| if sparse.issparse(X) and self.fit_intercept: | ||
| solver = 'sparse_cg' | ||
| if self.solver not in ['auto', 'sparse_cg']: |
There was a problem hiding this comment.
The solver resolution (e.g. for auto is normally done in _ridge_regression) maybe better to move this there as well.
There was a problem hiding this comment.
The difficulty is that depending on fit_intercept and whether X is dense we
need to provide _ridge_regression with X_offset and X_scale (computed in
preprocessing) or not:
scikit-learn/sklearn/linear_model/ridge.py
Line 571 in e871a56
There was a problem hiding this comment.
I would not have prevented someone to use 'sag' with fit_intercept=True. It's not broken per se it is just that it needs a lore more iterations than the default value.
There was a problem hiding this comment.
In this case should there be a warning? Users may be surprised by the number of
iterations they need to set:
>>> x, y = _make_sparse_offset_regression(n_samples=20, n_features=5, random_state=0)
>>> sp_ridge = Ridge(solver='sag', max_iter=10000000, tol=1e-8).fit(sparse.csr_matrix(x), y)
>>> ridge = Ridge(solver='sag', max_iter=10000000, tol=1e-8).fit(x, y)
>>> sp_ridge.n_iter_[0]
566250
>>> ridge.n_iter_[0]
100
>>> np.allclose(sp_ridge.intercept_, ridge.intercept_, rtol=1e-3)
FalseThere was a problem hiding this comment.
@agramfort I restored the possibility of using 'sag' and added a warning, let me know if I should remove the warning
| assert_raises_regex(ValueError, "In Ridge,", sparse.fit, X_csr, y) | ||
| for solver in ['saga', 'lsqr', 'sag']: | ||
| sparse = Ridge(alpha=1., solver=solver, fit_intercept=True) | ||
| assert_warns(UserWarning, sparse.fit, X_csr, y) |
There was a problem hiding this comment.
We do need to match the warning message (a lot of unrelated things can raise a UserWarning), even if it means splitting checks for sag and other solvers.
|
Also, I wonder if it isn't better to raise an exception on an unsupported solver rather than switch solver with a warning. |
thanks! I changed the warning to a |
glemaitre
left a comment
There was a problem hiding this comment.
IMO, we can consider it as a bug fix rather than a change of default.
So we will need an entry in what's new as a bug fix and document it in model changes as well.
| assert_raises_regex(ValueError, "In Ridge,", sparse.fit, X_csr, y) | ||
| for solver in ['saga', 'lsqr', 'sag']: | ||
| sparse = Ridge(alpha=1., solver=solver, fit_intercept=True) | ||
| assert_raises_regex( |
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
| max_iter = 50 | ||
| tol = .000001 | ||
| fit_intercept = True | ||
| fit_intercept = False |
There was a problem hiding this comment.
why do you need this @jeromedockes ? SAG fails to get a good intercept?
can you try initializing the intercept to np.mean(y_train) instead of 0 (if done like this now)?
There was a problem hiding this comment.
At the moment it does fail to fit a good intercept (with the default tol and
n_iter)
the intercept is indeed initialized with zeros:
scikit-learn/sklearn/linear_model/ridge.py
Line 486 in c315bf9
but initializing it with the mean of y probably won't change much since this
mean is 0: y is always assumed to be dense and centered in preprocessing.
The mean of X is what causes the intercept to be nonzero.
There was a problem hiding this comment.
I was thinking that we could revert to only allowing sparse_cg for sparse data
in this PR to quickly fix the bug and unlock #13246 , and then see if support
for sparse data can be added to the sag solver in a separate PR. WDYT?
There was a problem hiding this comment.
I am +1 for this path. I think that we should ensure that we give proper results. We can investigate later how to fix SAG for this case. @agramfort WDYT?
There was a problem hiding this comment.
BTW one of the reasons the intercept takes many iterations to converge can be this decay
scikit-learn/sklearn/linear_model/base.py
Line 43 in 4a6264d
scikit-learn/sklearn/linear_model/base.py
Line 97 in 4a6264d
scikit-learn/sklearn/linear_model/sag.py
Line 299 in 4a6264d
| dense = Ridge(alpha=1., tol=1.e-15, solver=solver, fit_intercept=True) | ||
| sparse = Ridge(alpha=1., tol=1.e-15, solver=solver, fit_intercept=True) | ||
| # for now only sparse_cg can fit an intercept with sparse X | ||
| for solver in ['sparse_cg']: |
There was a problem hiding this comment.
you can remove the for loop. We will use the parametrization from pytest when we will reintroduce sag
| assert_raises_regex(ValueError, "In Ridge,", sparse.fit, X_csr, y) | ||
| for solver in ['saga', 'lsqr', 'sag']: | ||
| sparse = Ridge(alpha=1., solver=solver, fit_intercept=True) | ||
| with pytest.raises( |
There was a problem hiding this comment.
This will be a bit more compact
err_msg = "solver='{}' does not support".format(solver)
with pytest.raises(ValueError, match=err_msg):
sparse.fit(X_csr, y)| sparse = Ridge(alpha=1., tol=1.e-15, solver=solver, fit_intercept=True) | ||
| assert_raises_regex(ValueError, "In Ridge,", sparse.fit, X_csr, y) | ||
| for solver in ['saga', 'lsqr', 'sag']: | ||
| sparse = Ridge(alpha=1., solver=solver, fit_intercept=True) |
There was a problem hiding this comment.
Avoid to call it sparse. I think that we have the following import sometimes: from scipy import sparse.
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
| sparse = Ridge(alpha=1., tol=1.e-15, solver=solver, fit_intercept=True) | ||
| assert_raises_regex(ValueError, "In Ridge,", sparse.fit, X_csr, y) | ||
| for solver in ['saga', 'lsqr', 'sag']: | ||
| sparse = Ridge(alpha=1., solver=solver, fit_intercept=True) |
sklearn/linear_model/ridge.py
Outdated
| multi_output=True, y_numeric=True) | ||
| if sparse.issparse(X) and self.fit_intercept: | ||
| solver = 'sparse_cg' | ||
| if self.solver not in ['auto', 'sparse_cg']: |
There was a problem hiding this comment.
I would not have prevented someone to use 'sag' with fit_intercept=True. It's not broken per se it is just that it needs a lore more iterations than the default value.
doc/whats_new/v0.22.rst
Outdated
| :pr:`13618` by :user:`Yoshihiro Uchida <c56pony>`. | ||
|
|
||
| - |Fix| :class:`linear_model.Ridge` now correctly fits an intercept when | ||
| `X` is sparse, `solver="auto"` and `fit_intercept=True`. Setting the solver to |
There was a problem hiding this comment.
explain that it is because the default solver is now sparse_cg
sklearn/linear_model/ridge.py
Outdated
| '"sag" solver requires many iterations to fit ' | ||
| 'an intercept with sparse inputs. Either set the ' | ||
| 'solver to "auto" or "sparse_cg", or set a low ' | ||
| '"tol" and a high "max_iter".') |
There was a problem hiding this comment.
what bothers me is that you will get the warning whatever you use for tol or max_iter. Maybe just warn if parameter used are the defaults?
| # tol and max_iter, sag should raise a warning and is handled in | ||
| # test_ridge_fit_intercept_sparse_sag | ||
| # "auto" should switch to "sparse_cg" | ||
| dense_ridge = Ridge(alpha=1., solver='sparse_cg', fit_intercept=True) |
There was a problem hiding this comment.
sparse_cg can fit both sparse and dense data. since both "auto" and "sparse_cg" should result in "sparse_cg" being used when X is sparse, the reference is "sparse_cg" with dense data, and Ridge(solver="auto") and Ridge(solver="sparse_cg"), fitted on sparse data, are compared to it
| y = 0.5 * X.ravel() | ||
|
|
||
| clf1 = Ridge(tol=tol, solver='sag', max_iter=max_iter, | ||
| fit_intercept=False, |
There was a problem hiding this comment.
I would revert changes to test if these are not necessary. Just catch the warning if need be.
There was a problem hiding this comment.
I reverted them, there is no warning because in those tests the tol and max_iter used are not the default ones
Co-Authored-By: Alexandre Gramfort <alexandre.gramfort@m4x.org>
…edockes/scikit-learn into fix_ridge_solver_selection
| # for now only sparse_cg can fit an intercept with sparse X with default | ||
| # tol and max_iter, sag should raise a warning and is handled in | ||
| # test_ridge_fit_intercept_sparse_sag | ||
| # "auto" should switch to "sparse_cg" |
There was a problem hiding this comment.
it this comment still relevant? I don't sag warnings caught here.
If you update the comment can you write what you answered to me just below? thanks
There was a problem hiding this comment.
thanks! updated the comment. you don't see warnings here because sag's behaviour in this configuration is tested separately in test_ridge_fit_intercept_sparse_sag
agramfort
left a comment
There was a problem hiding this comment.
Let's wait for another approval before merging. thx @jeromedockes
glemaitre
left a comment
There was a problem hiding this comment.
A single nitpick and then I will merge. LGTM.
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
|
Thanks @jeromedockes |
|
thanks a lot for the help and advice @glemaitre @agramfort and @rth |
Ridgeis failing the check introduced in #13246 to verify that estimators produce the same results for sparse and dense data.this PR enforces selecting the
sparse_cgsolver when X is sparse andfit_intercept=True, since this solver is the only one to correctly fit an intercept with the Ridge defaulttolandmax_iterwhen X is sparse at the moment.@agramfort @glemaitre @ogrisel