When testing the results of the linear models with normalize set to True and sample_weight in the PR in the PR #19426 we noted that for the sparse data the result is not correct in the case when there is a constant non-zero feature, for example:
X = rng.rand(n_samples, n_features)
X[X < 0.5] = 0.
X[:, 2] = 1.
The normalization is close to 0 but never exactly 0 due to the roundoff errors so we don't replace it with 1s.
Therefore if we divide the X with mean by the normalization we get high number.
This is the same if we call StandardScaler with the same data and sample_weight.
Possibly because they are both using mean_variance_axis()
cc @ogrisel
When testing the results of the linear models with
normalizeset to True andsample_weightin the PR in the PR #19426 we noted that for the sparse data the result is not correct in the case when there is a constant non-zero feature, for example:The normalization is close to 0 but never exactly 0 due to the roundoff errors so we don't replace it with 1s.
Therefore if we divide the X with mean by the normalization we get high number.
This is the same if we call
StandardScalerwith the same data andsample_weight.Possibly because they are both using
mean_variance_axis()cc @ogrisel