Skip to content

TST check equivalence normalize/StandardScaler and dense/sparse in linear models#17665

Merged
glemaitre merged 55 commits intoscikit-learn:masterfrom
maikia:test_normalize_as_pipeline
Jun 26, 2020
Merged

TST check equivalence normalize/StandardScaler and dense/sparse in linear models#17665
glemaitre merged 55 commits intoscikit-learn:masterfrom
maikia:test_normalize_as_pipeline

Conversation

@maikia
Copy link
Copy Markdown
Contributor

@maikia maikia commented Jun 23, 2020

towards: #3020

In linear_models such as Lasso there is an option to select normalize=True. However, if fit_intercept is set to False this won't have any effect.
Towards depreciating normalize in linear_models altogether we want to give a user an option to first normalize using StandardScaler and then call the linear_model.

This tests make sure that the two options will give the same results and the same .coef_ (event though .intercept_ might differ)
The tests are done for both the sparse and the dense datasets

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

maikia added 3 commits June 23, 2020 13:21
…> linear_model(normalize=False) gives the same result as linear_model(normalize=True)
maikia and others added 25 commits June 23, 2020 14:36
…ormalize=False) results are the same for the sparse and the dense data: need to add other linear models as well
…_true (failing on arrays are not almost equal
Copy link
Copy Markdown
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are converging :P

Only some style thing. You might want to merge master into your branch to discard the error with circle ci

@glemaitre
Copy link
Copy Markdown
Member

glemaitre commented Jun 25, 2020

ping @rth You should know about this issue as well. If you want to have a look to merge the PR once the changes will be done.

Copy link
Copy Markdown
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @maikia !

model_dense.fit(X, y)
model_sparse.fit(X_sparse, y)

assert_allclose(model_sparse[1].coef_, model_dense[1].coef_)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to also check the intercept as mentioned in #3020 (comment)?

@rth
Copy link
Copy Markdown
Member

rth commented Jun 25, 2020

Also could you please merge master in to fix documentation CI?

@glemaitre glemaitre self-assigned this Jun 26, 2020
y_pred_sparse = model_sparse.predict(X_sparse)
assert_allclose(y_pred_dense, y_pred_sparse)

assert_allclose(model_dense[1].intercept_, model_sparse[1].intercept_)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first we were reapplying the offset on the intercept but it seems that we always do that:

if self.fit_intercept:
self.coef_ = self.coef_ / X_scale
self.intercept_ = y_offset - np.dot(X_offset, self.coef_.T)

Then the equivalence between the intercept should strict equality. Does it seem correct to you @agramfort?

@glemaitre glemaitre removed their assignment Jun 26, 2020
Copy link
Copy Markdown
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the tests look good (only one change). @agramfort @rth if you can have a look at the intercept and potentially merge the PR, it could be great.

@glemaitre
Copy link
Copy Markdown
Member

And codecov is reporting bullshit :)

maikia and others added 2 commits June 26, 2020 12:55
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Copy link
Copy Markdown
Member

@agramfort agramfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

let's deprecate the normalize param now.

@glemaitre or @rth feel free to merge

@glemaitre glemaitre merged commit 1c62652 into scikit-learn:master Jun 26, 2020
@glemaitre
Copy link
Copy Markdown
Member

Thanks @maikia now you are responsible for all the tests in _coordinate_descent.py :P
Let's go for the deprecation of normalize now.

glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request Jul 17, 2020
…near models (scikit-learn#17665)

Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>
jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020
…near models (scikit-learn#17665)

Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants