-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
Description
Describe the bug
As noted in #22910, Ridge(alpha=0, fit_intercept=True) does not give the minimal norm solution for wide data, i.e. n_features > n_samples.
Note that we nowhere guarantee that we provide the minimum norm solution.
Edit: Same seems to hold for LinearRegression, see #26164.
Probable Cause
For wide X, the least squares problem reads a bit different:
With explicit intercept
This is incompatible with our mean centering approach.
Example
Details
import numpy as np
from numpy.testing import assert_allclose
from scipy import linalg
from sklearn.datasets import make_low_rank_matrix
from sklearn.linear_model import Ridge
n_samples, n_features = 4, 12 # wide data
k = min(n_samples, n_features)
rng = np.random.RandomState(42)
X = make_low_rank_matrix(
n_samples=n_samples, n_features=n_features, effective_rank=k
)
X[:, -1] = 1 # last columns acts as intercept
U, s, Vt = linalg.svd(X)
assert np.all(s) > 1e-3 # to be sure X is not singular
U1, U2 = U[:, :k], U[:, k:]
Vt1, _ = Vt[:k, :], Vt[k:, :]
y = rng.uniform(low=-10, high=10, size=n_samples)
# w = X'(XX')^-1 y = V s^-1 U' y
coef_ols = Vt1.T @ np.diag(1 / s) @ U1.T @ y
model = Ridge(alpha=0, fit_intercept=True)
X = X[:, :-1] # remove intercept
intercept = coef_ols[-1]
coef = coef_ols[:-1]
model.fit(X, y)
# Check that we have found a solution => residuals = 0
assert_allclose(model.predict(X), y)
# Check that `coef`, `intercept` also provide a valid solution
assert_allclose(X @ coef + intercept, y)
# Ridge does not give the minimum norm solution. (This should be equal.)
np.linalg.norm(np.r_[model.intercept_, model.coef_]) > np.linalg.norm(
np.r_[intercept, coef]
)This last statement should be be False. It proves that Ridge does not give the miminum norm solution.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status