Skip to content

linear_model with normalize and StandardScaler lead to faulty results with weighted constant features #19450

@maikia

Description

@maikia

When testing the results of the linear models with normalize set to True and sample_weight in the PR in the PR #19426 we noted that for the sparse data the result is not correct in the case when there is a constant non-zero feature, for example:

X = rng.rand(n_samples, n_features)
X[X < 0.5] = 0.
X[:, 2] = 1.

The normalization is close to 0 but never exactly 0 due to the roundoff errors so we don't replace it with 1s.

Therefore if we divide the X with mean by the normalization we get high number.

This is the same if we call StandardScaler with the same data and sample_weight.

Possibly because they are both using mean_variance_axis()

cc @ogrisel

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions