TST check equivalence normalize/StandardScaler and dense/sparse in linear models by maikia · Pull Request #17665 · scikit-learn/scikit-learn

maikia · 2020-06-23T11:35:27Z

towards: #3020

In linear_models such as Lasso there is an option to select normalize=True. However, if fit_intercept is set to False this won't have any effect.
Towards depreciating normalize in linear_models altogether we want to give a user an option to first normalize using StandardScaler and then call the linear_model.

This tests make sure that the two options will give the same results and the same .coef_ (event though .intercept_ might differ)
The tests are done for both the sparse and the dense datasets

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

…> linear_model(normalize=False) gives the same result as linear_model(normalize=True)

…into test_normalize_as_pipeline

sklearn/linear_model/tests/test_coordinate_descent.py

…ormalize=False) results are the same for the sparse and the dense data: need to add other linear models as well

…into test_normalize_as_pipeline

…a/scikit-learn into test_normalize_as_pipeline

… them

…into test_normalize_as_pipeline

…_true (failing on arrays are not almost equal

…into test_normalize_as_pipeline

…a/scikit-learn into test_normalize_as_pipeline

…into test_normalize_as_pipeline

… normalize and sparse matrices

…into test_normalize_as_pipeline

glemaitre

So we are converging :P

Only some style thing. You might want to merge master into your branch to discard the error with circle ci

sklearn/linear_model/tests/test_coordinate_descent.py

glemaitre · 2020-06-25T16:41:30Z

ping @rth You should know about this issue as well. If you want to have a look to merge the PR once the changes will be done.

…into test_normalize_as_pipeline

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

rth

Thanks for working on this @maikia !

sklearn/linear_model/tests/test_coordinate_descent.py

rth · 2020-06-25T20:45:48Z

sklearn/linear_model/tests/test_coordinate_descent.py

+    model_dense.fit(X, y)
+    model_sparse.fit(X_sparse, y)
+
+    assert_allclose(model_sparse[1].coef_, model_dense[1].coef_)


Do we want to also check the intercept as mentioned in #3020 (comment)?

sklearn/linear_model/tests/test_coordinate_descent.py

rth · 2020-06-25T20:49:20Z

Also could you please merge master in to fix documentation CI?

…into test_normalize_as_pipeline

Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>

…into test_normalize_as_pipeline

…n.dot(coef))

…into test_normalize_as_pipeline

glemaitre · 2020-06-26T08:53:31Z

sklearn/linear_model/tests/test_coordinate_descent.py

+    y_pred_sparse = model_sparse.predict(X_sparse)
+    assert_allclose(y_pred_dense, y_pred_sparse)
+
+    assert_allclose(model_dense[1].intercept_, model_sparse[1].intercept_)


At first we were reapplying the offset on the intercept but it seems that we always do that:

scikit-learn/sklearn/linear_model/_base.py

Lines 243 to 245 in 27cfe14

if self.fit_intercept:

self.coef_ = self.coef_ / X_scale

self.intercept_ = y_offset - np.dot(X_offset, self.coef_.T)

Then the equivalence between the intercept should strict equality. Does it seem correct to you @agramfort?

sklearn/linear_model/tests/test_coordinate_descent.py

glemaitre

I think the tests look good (only one change). @agramfort @rth if you can have a look at the intercept and potentially merge the PR, it could be great.

glemaitre · 2020-06-26T09:35:31Z

And codecov is reporting bullshit :)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

…into test_normalize_as_pipeline

agramfort

LGTM

let's deprecate the normalize param now.

@glemaitre or @rth feel free to merge

glemaitre · 2020-06-26T12:18:15Z

Thanks @maikia now you are responsible for all the tests in _coordinate_descent.py :P
Let's go for the deprecation of normalize now.

…near models (scikit-learn#17665) Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>

maikia added 3 commits June 23, 2020 13:21

started working on the test to check if the pipeline: standardScaler-…

f73a2ee

…> linear_model(normalize=False) gives the same result as linear_model(normalize=True)

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

7eda81f

…into test_normalize_as_pipeline

updated assert statement

80ff94e

github-actions bot added the module:linear_model label Jun 23, 2020

glemaitre reviewed Jun 23, 2020

View reviewed changes

sklearn/linear_model/tests/test_coordinate_descent.py Outdated Show resolved Hide resolved

maikia and others added 25 commits June 23, 2020 14:36

added test checking if in the pipeline(standardScaler->linear_model(n…

6cab322

…ormalize=False) results are the same for the sparse and the dense data: need to add other linear models as well

fix test and scale of alpha between normalize and StandardScaler

986d806

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

9eaa63c

…into test_normalize_as_pipeline

Merge branch 'test_normalize_as_pipeline' of https://github.com/maiki…

e3f0109

…a/scikit-learn into test_normalize_as_pipeline

make the test more general for other models

995397b

added lassolars

41984a1

collecting all the linear models with the param alpha and subdividing…

72a5780

… them

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

9a8e50f

…into test_normalize_as_pipeline

further subdivided models into errors they are running into

68ac604

add models with no alpha do not pass

05a4fd4

added test for multitask models (failing)

e936fac

making more of the tests to fail

ad824d6

adding ridge and elastic net to test_model_pipeline_same_as_normalize…

9c1003e

…_true (failing on arrays are not almost equal

fix one test

fb81427

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

5ea9b0d

…into test_normalize_as_pipeline

Merge branch 'test_normalize_as_pipeline' of https://github.com/maiki…

026f827

…a/scikit-learn into test_normalize_as_pipeline

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

9c4a653

…into test_normalize_as_pipeline

alpha added to passed args

91b5085

added BayesianRidge

4272085

added multitask models

aca8e4e

checking which model pass the sparse vs dense X.

8ec29b9

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

1b59cb8

…into test_normalize_as_pipeline

test_model_pipeline_same_dense_and_sparse for all models which accept…

7d800d3

… normalize and sparse matrices

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

dbe8843

…into test_normalize_as_pipeline

removed multitask from dense vs sparse

835b56a

glemaitre reviewed Jun 25, 2020

View reviewed changes

maikia and others added 4 commits June 25, 2020 20:41

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

9b36a43

…into test_normalize_as_pipeline

cleaning up

596aab9

Update sklearn/linear_model/tests/test_coordinate_descent.py

0ef62dc

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

cleaning up;

36b8e46

rth reviewed Jun 25, 2020

View reviewed changes

maikia and others added 7 commits June 26, 2020 09:47

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

4083797

…into test_normalize_as_pipeline

Update sklearn/linear_model/tests/test_coordinate_descent.py

78f9d59

Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>

changed selection of the model for model_name

91a90ca

change way of setting new params

0b8a1eb

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

35618e0

…into test_normalize_as_pipeline

added insert model_dens.intercept_ == model_sparse.intercept_ - x_mea…

542866e

…n.dot(coef))

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

21ec9c9

…into test_normalize_as_pipeline

glemaitre self-assigned this Jun 26, 2020

maikia added 2 commits June 26, 2020 10:50

changed assert intercept equal

f20c807

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

ef1a60f

…into test_normalize_as_pipeline

glemaitre reviewed Jun 26, 2020

View reviewed changes

glemaitre removed their assignment Jun 26, 2020

glemaitre reviewed Jun 26, 2020

View reviewed changes

sklearn/linear_model/tests/test_coordinate_descent.py Outdated Show resolved Hide resolved

glemaitre approved these changes Jun 26, 2020

View reviewed changes

maikia and others added 2 commits June 26, 2020 12:55

Update sklearn/linear_model/tests/test_coordinate_descent.py

20af343

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

3d4d252

…into test_normalize_as_pipeline

agramfort approved these changes Jun 26, 2020

View reviewed changes

glemaitre merged commit 1c62652 into scikit-learn:master Jun 26, 2020

	if self.fit_intercept:
	self.coef_ = self.coef_ / X_scale
	self.intercept_ = y_offset - np.dot(X_offset, self.coef_.T)

Uh oh!

Conversation

maikia commented Jun 23, 2020

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glemaitre commented Jun 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rth Jun 25, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rth commented Jun 25, 2020

Uh oh!

glemaitre Jun 26, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Jun 26, 2020

Uh oh!

agramfort left a comment

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Jun 26, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

glemaitre commented Jun 25, 2020 •

edited

Loading