MRG Deprecates 'normalize' in LinearRegression (_base.py)#17743
MRG Deprecates 'normalize' in LinearRegression (_base.py)#17743ogrisel merged 141 commits intoscikit-learn:mainfrom
Conversation
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
…/scikit-learn into depreciate_normalize_base
|
@rth @agramfort @glemaitre (problem with the docs should hopefully be fixed soon: #17745) |
…into depreciate_normalize_base
agramfort
left a comment
There was a problem hiding this comment.
you will also need an entry in what's new do document the deprecation
besides LTGM provided CIs are happy (doc build included)
…into depreciate_normalize_base
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
…into depreciate_normalize_base
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
…er test the impact of with_mean
ogrisel
left a comment
There was a problem hiding this comment.
I pushed a new commit to change the new test to see the impact of the with_mean parameter of the StandardScaler for dense inputs.
So apparently, both with_mean=True and with_mean=False work for dense data on LinearRegression. I assume the mean feature value is moved to the intercept and therefore scaling with or without mean does change the equivalence asserted in the test.
However I am not sure about how regularization will impact this if we are to write a similar test for Ridge and Lasso for instance.
|
The test failure is unrelated and reported in a dedicated issue: #19224. |
ogrisel
left a comment
There was a problem hiding this comment.
In light of the updated test, I think it's fine to keep an explicit with_mean=False in the deprecation message.
LGTM for merge once the following comment is addressed:
| ) | ||
| def test_linear_model_sample_weights_normalize_in_pipeline( | ||
| estimator, is_sparse, with_mean | ||
| ): |
There was a problem hiding this comment.
If this test is only meant to test LinearRegression it should be moved to sklearn/linear_model/tests/test_base.py. If it's meant to be extended to Ridge, Lasso... maybe it should be move to a new file, e.g. sklearn/linear_model/tests/test_linear_model.py
There was a problem hiding this comment.
If I recall I was proposing sklearn/linear_model/tests/test_common.py that is the usual way that we structure common tests for a module.
There was a problem hiding this comment.
To anticipate this question, I tried to see if this test would pass with the current code for Ridge and Lasso and actually it always fails whether with_mean is True or False on dense data and it also fails with with_mean=False on sparse data:
sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_model_sample_weights_normalize_in_pipeline[LinearRegression-True-False] PASSED [ 12%]
sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_model_sample_weights_normalize_in_pipeline[LinearRegression-False-True] PASSED [ 25%]
sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_model_sample_weights_normalize_in_pipeline[LinearRegression-False-False] PASSED [ 37%]
sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_model_sample_weights_normalize_in_pipeline[Ridge-True-False] FAILED [ 50%]
sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_model_sample_weights_normalize_in_pipeline[Ridge-False-True] FAILED [ 62%]
sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_model_sample_weights_normalize_in_pipeline[Ridge-False-False] FAILED [ 75%]
sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_model_sample_weights_normalize_in_pipeline[Lasso-False-True] FAILED [ 87%]
sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_model_sample_weights_normalize_in_pipeline[Lasso-False-False] FAILED [100%]
So the deprecation of their normalize option should not be implemented with the same message I believe.
To keep this PR focused, let's just move this test to sklearn/linear_model/tests/test_base.py for now.
There was a problem hiding this comment.
Yes @glemaitre . I answered your comment, but it must have gone lost int the flow of other comments:
#17743 (comment)
I will move it to test_base.py.
@ogrisel failing for Ridge and Lasso might indeed be a problem as it was supposed to be extended to include them in this test. Why is this the case (for their failing)?
There was a problem hiding this comment.
@glemaitre
There is no file: sklearn/linear_model/tests/test_common.py
(there is: sklearn/tests/test_common.py, hence my previous question above).
Should I create it?
There was a problem hiding this comment.
There is no file: sklearn/linear_model/tests/test_common.py
We could create one. But it's fine to keep in sklearn/linear_model/tests/test_base.py for now. You can move this test to sklearn/linear_model/tests/test_common.py in a PR that needs to reuse it for another estimator of the sklearn.linear_model module.
…into depreciate_normalize_base
|
Merged. The rest of the discussion can be handled in PRs related to Ridge and Lasso. |
|
congrats and thanks @maikia ! |
Towards: #3020
It deprecates 'normalize' in _base.py (LinearRegression)