Skip to content

BUG unpenalized Ridge does not give minimum norm solution #22947

@lorentzenchr

Description

@lorentzenchr

Describe the bug

As noted in #22910, Ridge(alpha=0, fit_intercept=True) does not give the minimal norm solution for wide data, i.e. n_features > n_samples.

Note that we nowhere guarantee that we provide the minimum norm solution.

Edit: Same seems to hold for LinearRegression, see #26164.

Probable Cause

For wide X, the least squares problem reads a bit different: $\mathrm{min} ||w||_2$ subject to $Xw = y$ with solution $w = X'(XX')^{-1} y$, see e.g. http://ee263.stanford.edu/lectures/min-norm.pdf.
With explicit intercept $w_0$, this reads $w = X'(XX' + 1 1')^{-1} y$, where 1 is a column vector of ones. $w_0 = 1'(XX' + 1 1')^{-1} y$.
This is incompatible with our mean centering approach.

Example

Details
import numpy as np
from numpy.testing import assert_allclose
from scipy import linalg
from sklearn.datasets import make_low_rank_matrix
from sklearn.linear_model import Ridge

n_samples, n_features = 4, 12  # wide data
k = min(n_samples, n_features)
rng = np.random.RandomState(42)
X = make_low_rank_matrix(
    n_samples=n_samples, n_features=n_features, effective_rank=k
)
X[:, -1] = 1  # last columns acts as intercept
U, s, Vt = linalg.svd(X)
assert np.all(s) > 1e-3  # to be sure X is not singular
U1, U2 = U[:, :k], U[:, k:]
Vt1, _ = Vt[:k, :], Vt[k:, :]
y = rng.uniform(low=-10, high=10, size=n_samples)
# w = X'(XX')^-1 y = V s^-1 U' y
coef_ols = Vt1.T @ np.diag(1 / s) @ U1.T @ y

model = Ridge(alpha=0, fit_intercept=True)
X = X[:, :-1]  # remove intercept
intercept = coef_ols[-1]
coef = coef_ols[:-1]
model.fit(X, y)

# Check that we have found a solution => residuals = 0
assert_allclose(model.predict(X), y)
# Check that `coef`, `intercept` also provide a valid solution
assert_allclose(X @ coef + intercept, y)

# Ridge does not give the minimum norm solution. (This should be equal.)
np.linalg.norm(np.r_[model.intercept_, model.coef_]) > np.linalg.norm(
    np.r_[intercept, coef]
)

This last statement should be be False. It proves that Ridge does not give the miminum norm solution.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    In Progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions