TST Add tests for LinearRegression that sample weights act consistently by lorentzenchr · Pull Request #15554 · scikit-learn/scikit-learn

lorentzenchr · 2019-11-06T22:49:02Z

Reference Issues/PRs

Relates to #21504.

Make bugs in #15438 visible by introducing tests (with xfail).

What does this implement/fix? Explain your changes.

So far, this PR only adds tests for Ridge and LinearRegression.
Branch is based on #15530.

Any other comments?

Would be nice to have the generalizable part of those tests in sklearn/utils/estimator_checks.py:

sample_weight = 0 equivalent of removing that sample (data row). Edit: Ongoing effort in add common test that zero sample weight means samples are ignored #15015
combination of sample weights and sparse input.

rth · 2019-11-07T19:00:49Z

Would be nice to have the generalizable part of those tests in sklearn/utils/estimator_checks.py:
sample_weight = 0 equivalent of removing that sample (data row).

One was actually proposed in #15015 but is waiting for all estimators to pass. Not sure it's the best strategy, maybe marking it as XFAIL would indeed be better.

lorentzenchr · 2020-05-10T13:24:34Z

This PR adds 4 tests to Ridge and LinearRegression:

sample_weight=np.ones(...) gives same fit as sample_weight=None. This is better solved in Common check for sample weight invariance with removed samples #16507 for all estimators, once it is merged.
Setting elements of sample_weight to 0 is equivalent to removing the corresponding sample. This is better solved in Common check for sample weight invariance with removed samples #16507 for all estimators, once it is merged.
Scaling of sample_weight should have no effect.
This is related to Refactor tests for sample weights #11316 RFC Semantic of sample_weight in regression metrics #15651 RFC Sample weight invariance properties #15657.
Multiplying sample_weight by 2 is equivalent to repeating correspoding samples twice.

I could adapt this PR to only do 3 and 4 for Ridge and LinearRegression.

rth · 2020-05-10T13:43:17Z

Thanks @lorentzenchr ! I have merged #16507 which should make some of this easier.

lorentzenchr · 2020-05-10T14:28:47Z

@rth Do you know if the tests of #16507 also test sample_weight properties for sparse X? If so, I can get rid of point 1 and 2 (see comment above) at least for this PR.

rth · 2020-05-10T15:21:08Z

Sorry, @lorentzenchr I reverted #16507 in the end, there were incompatible changes to check_estimator meanwhile. New PR is in #17176.

Do you know if the tests of #16507 also test sample_weight properties for sparse X?

No but it would definitely makes sense to add them there as well (for estimators that support it).

rth · 2020-06-04T11:03:11Z

New PR is in #17176.

That PR is finally merged.

There is also #17441 which would allow running common tests with non default parameters (i.e. other values of fit_intercept, solver, etc). For instance,

$ pytest sklearn/tests/test_common_non_default.py -k "check_sample_weights_invariance and Ridge and not (RidgeClassifier or RidgeCV or BayesianRidge)" -v
========================================================= test session starts =========================================================
platform linux -- Python 3.8.0, pytest-5.2.1, py-1.8.0, pluggy-0.13.0 -- /home/rth/miniconda3/envs/sklearn-dev/bin/python
cachedir: .pytest_cache
rootdir: /home/rth/src/scikit-learn, inifile: setup.cfg
plugins: forked-1.1.3, xdist-1.31.0
collected 58603 items / 58547 deselected / 56 selected                                                                                

test_common_non_default[Ridge(fit_intercept=False)-check_sample_weights_invariance(kind=ones)] PASSED [  1%]
test_common_non_default[Ridge(fit_intercept=False)-check_sample_weights_invariance(kind=zeros)] PASSED [  3%]
test_common_non_default[Ridge(fit_intercept=False,solver='cholesky')-check_sample_weights_invariance(kind=ones)] PASSED [  5%]
test_common_non_default[Ridge(fit_intercept=False,solver='cholesky')-check_sample_weights_invariance(kind=zeros)] PASSED [  7%]
test_common_non_default[Ridge(fit_intercept=False,solver='lsqr')-check_sample_weights_invariance(kind=ones)] PASSED [  8%]
test_common_non_default[Ridge(fit_intercept=False,solver='lsqr')-check_sample_weights_invariance(kind=zeros)] PASSED [ 10%]
test_common_non_default[Ridge(fit_intercept=False,solver='sag')-check_sample_weights_invariance(kind=ones)] PASSED [ 12%]
test_common_non_default[Ridge(fit_intercept=False,solver='sag')-check_sample_weights_invariance(kind=zeros)] FAILED [ 14%]
test_common_non_default[Ridge(fit_intercept=False,solver='saga')-check_sample_weights_invariance(kind=ones)] PASSED [ 16%]
test_common_non_default[Ridge(fit_intercept=False,solver='saga')-check_sample_weights_invariance(kind=zeros)] FAILED [ 17%]
test_common_non_default[Ridge(fit_intercept=False,solver='sparse_cg')-check_sample_weights_invariance(kind=ones)] PASSED [ 19%]
test_common_non_default[Ridge(fit_intercept=False,solver='sparse_cg')-check_sample_weights_invariance(kind=zeros)] PASSED [ 21%]
test_common_non_default[Ridge(fit_intercept=False,solver='svd')-check_sample_weights_invariance(kind=ones)] PASSED [ 23%]
test_common_non_default[Ridge(fit_intercept=False,solver='svd')-check_sample_weights_invariance(kind=zeros)] PASSED [ 25%]
test_common_non_default[Ridge(fit_intercept=False,normalize=True)-check_sample_weights_invariance(kind=ones)] PASSED [ 26%]
test_common_non_default[Ridge(fit_intercept=False,normalize=True)-check_sample_weights_invariance(kind=zeros)] PASSED [ 28%]
test_common_non_default[Ridge(fit_intercept=False,normalize=True,solver='cholesky')-check_sample_weights_invariance(kind=ones)] PASSED [ 30%]
test_common_non_default[Ridge(fit_intercept=False,normalize=True,solver='cholesky')-check_sample_weights_invariance(kind=zeros)] PASSED [ 32%]
test_common_non_default[Ridge(fit_intercept=False,normalize=True,solver='lsqr')-check_sample_weights_invariance(kind=ones)] PASSED [ 33%]
test_common_non_default[Ridge(fit_intercept=False,normalize=True,solver='lsqr')-check_sample_weights_invariance(kind=zeros)] PASSED [ 35%]
test_common_non_default[Ridge(fit_intercept=False,normalize=True,solver='sag')-check_sample_weights_invariance(kind=ones)] PASSED [ 37%]
test_common_non_default[Ridge(fit_intercept=False,normalize=True,solver='sag')-check_sample_weights_invariance(kind=zeros)] FAILED [ 39%]
test_common_non_default[Ridge(fit_intercept=False,normalize=True,solver='saga')-check_sample_weights_invariance(kind=ones)] PASSED [ 41%]
test_common_non_default[Ridge(fit_intercept=False,normalize=True,solver='saga')-check_sample_weights_invariance(kind=zeros)] FAILED [ 42%]
test_common_non_default[Ridge(fit_intercept=False,normalize=True,solver='sparse_cg')-check_sample_weights_invariance(kind=ones)] PASSED [ 44%]
test_common_non_default[Ridge(fit_intercept=False,normalize=True,solver='sparse_cg')-check_sample_weights_invariance(kind=zeros)] PASSED [ 46%]
test_common_non_default[Ridge(fit_intercept=False,normalize=True,solver='svd')-check_sample_weights_invariance(kind=ones)] PASSED [ 48%]
test_common_non_default[Ridge(fit_intercept=False,normalize=True,solver='svd')-check_sample_weights_invariance(kind=zeros)] PASSED [ 50%]
test_common_non_default[Ridge()-check_sample_weights_invariance(kind=ones)] PASSED    [ 51%]
test_common_non_default[Ridge()-check_sample_weights_invariance(kind=zeros)] PASSED   [ 53%]
test_common_non_default[Ridge(solver='cholesky')-check_sample_weights_invariance(kind=ones)] PASSED [ 55%]
test_common_non_default[Ridge(solver='cholesky')-check_sample_weights_invariance(kind=zeros)] PASSED [ 57%]
test_common_non_default[Ridge(solver='lsqr')-check_sample_weights_invariance(kind=ones)] PASSED [ 58%]
test_common_non_default[Ridge(solver='lsqr')-check_sample_weights_invariance(kind=zeros)] PASSED [ 60%]
test_common_non_default[Ridge(solver='sag')-check_sample_weights_invariance(kind=ones)] PASSED [ 62%]
test_common_non_default[Ridge(solver='sag')-check_sample_weights_invariance(kind=zeros)] FAILED [ 64%]
test_common_non_default[Ridge(solver='saga')-check_sample_weights_invariance(kind=ones)] PASSED [ 66%]
test_common_non_default[Ridge(solver='saga')-check_sample_weights_invariance(kind=zeros)] FAILED [ 67%]
test_common_non_default[Ridge(solver='sparse_cg')-check_sample_weights_invariance(kind=ones)] PASSED [ 69%]
test_common_non_default[Ridge(solver='sparse_cg')-check_sample_weights_invariance(kind=zeros)] PASSED [ 71%]
test_common_non_default[Ridge(solver='svd')-check_sample_weights_invariance(kind=ones)] PASSED [ 73%]
test_common_non_default[Ridge(solver='svd')-check_sample_weights_invariance(kind=zeros)] PASSED [ 75%]
test_common_non_default[Ridge(normalize=True)-check_sample_weights_invariance(kind=ones)] PASSED [ 76%]
test_common_non_default[Ridge(normalize=True)-check_sample_weights_invariance(kind=zeros)] FAILED [ 78%]
test_common_non_default[Ridge(normalize=True,solver='cholesky')-check_sample_weights_invariance(kind=ones)] PASSED [ 80%]
test_common_non_default[Ridge(normalize=True,solver='cholesky')-check_sample_weights_invariance(kind=zeros)] FAILED [ 82%]
test_common_non_default[Ridge(normalize=True,solver='lsqr')-check_sample_weights_invariance(kind=ones)] PASSED [ 83%]
test_common_non_default[Ridge(normalize=True,solver='lsqr')-check_sample_weights_invariance(kind=zeros)] FAILED [ 85%]
test_common_non_default[Ridge(normalize=True,solver='sag')-check_sample_weights_invariance(kind=ones)] PASSED [ 87%]
test_common_non_default[Ridge(normalize=True,solver='sag')-check_sample_weights_invariance(kind=zeros)] FAILED [ 89%]
test_common_non_default[Ridge(normalize=True,solver='saga')-check_sample_weights_invariance(kind=ones)] PASSED [ 91%]
test_common_non_default[Ridge(normalize=True,solver='saga')-check_sample_weights_invariance(kind=zeros)] FAILED [ 92%]
test_common_non_default[Ridge(normalize=True,solver='sparse_cg')-check_sample_weights_invariance(kind=ones)] PASSED [ 94%]
test_common_non_default[Ridge(normalize=True,solver='sparse_cg')-check_sample_weights_invariance(kind=zeros)] FAILED [ 96%]
test_common_non_default[Ridge(normalize=True,solver='svd')-check_sample_weights_invariance(kind=ones)] PASSED [ 98%]
test_common_non_default[Ridge(normalize=True,solver='svd')-check_sample_weights_invariance(kind=zeros)] FAILED [100%]
===== 13 failed, 43 passed, 58547 deselected in 31.16s ===========

Edit: opened #17444 about Ridge(normalize=True)

lorentzenchr · 2021-06-27T13:54:55Z

Mmh, those tests reveal some shortcomings in Ridge for sparse input and for LinearRegression also for dense input.

lorentzenchr · 2022-10-28T09:43:57Z

#19616 added some tests, I'll rebase and clean up.

jjerphan · 2023-03-06T15:38:42Z

Hi @lorentzenchr, what is the state of this PR? Can one pursue it? 🙂

lorentzenchr · 2023-03-26T10:44:05Z

It took me longer than anticipated. The tests show some shortcomings, uncovered in dd4f742.

jjerphan

Thank @lorentzenchr. To you, what is the best approach to resolve the current assertion failures?

jjerphan · 2023-03-27T08:56:47Z

sklearn/linear_model/tests/test_base.py

+        # TODO: This often fails, e.g. when calling
+        # SKLEARN_TESTS_GLOBAL_RANDOM_SEED="all" pytest \
+        # sklearn/linear_model/tests/test_base.py\
+        # ::test_linear_regression_sample_weight_consistency
+        pass


For now, can you mark this test as xfail in this case and explain which assertion is not verified? Is this sensitive to the scaling of y[-5:]?

This fails at random, so xfail is not appropriate IMO.

I improved the comment in 80513c9 and linked to #26164.

lorentzenchr · 2023-04-14T11:33:34Z

With #26164 opened, this is clean and ready from my side.

lorentzenchr · 2023-04-30T10:34:22Z

@jjerphan Friendly ping. CI 🟢 This is "only" a test improvement, no functionality is changed, so a review with less scrutiny is acceptable, I guess.

jjerphan

LGTM. Thank you for your patience, @lorentzenchr.

Ideally, we could combine those tests into a single parametrized one, for now, the corner cases and the GLM implementations make it hardly doable.

sklearn/linear_model/tests/test_ridge.py

sklearn/linear_model/tests/test_base.py

sklearn/linear_model/tests/test_ridge.py

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

lorentzenchr · 2023-05-01T14:48:40Z

Ideally, we could combine those tests into a single parametrized one, for now, the corner cases and the GLM implementations make it hardly doable.

I thought about that. But as you say, there are too many different corner cases.

Can we merge as is with only one review as only tests are added? Or maybe @rth wants to bring it over the line and press one or two buttons🤞

jeremiedbb

Good to have this more extensively tested. I'd like to have such kind of test for more estimators because the common test just check for 1 combination and we often discover bugs years later for the other combinations. Thanks @lorentzenchr, LGTM !

…ly (scikit-learn#15554) Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

rth mentioned this pull request Nov 19, 2019

RFC Sample weight invariance properties #15657

Open

github-actions bot added the module:linear_model label Mar 2, 2020

rth mentioned this pull request May 10, 2020

Common check for sample weight invariance with removed samples #16507

Merged

lorentzenchr force-pushed the sw_tests branch from b16fdcb to 2e72d51 Compare May 10, 2020 14:29

cmarmo added the module:test-suite everything related to our tests label Sep 15, 2020

Base automatically changed from master to main January 22, 2021 10:51

ogrisel mentioned this pull request Feb 11, 2021

MRG fix Normalize for linear models when used with sample_weight #19426

Merged

lorentzenchr changed the title ~~[WIP] Add tests that sample weights act consistently~~ [MRG] Add tests that sample weights act consistently Jun 27, 2021

lorentzenchr added the No Changelog Needed label Jun 27, 2021

Christian Lorentzen and others added 9 commits October 28, 2022 12:06

TST consistent sample weights for LinearRegression

cb7e112

Rename test, add comments

6274f20

TST consistent sample weights for Ridge

2f203e4

MNT rebase

c3a7c32

TST update

b00963a

TST improve

761d0e2

TST update test_linear_regression_sample_weight_consistency

5db13f1

CLN clean up after rebase

9712d92

CLN do not skip

2fc0657

lorentzenchr force-pushed the sw_tests branch from e3b9ef8 to 2fc0657 Compare October 28, 2022 11:38

lorentzenchr added 2 commits March 7, 2023 22:58

Merge branch 'main' into sw_tests

99886fb

Merge branch 'main' into sw_tests

2166c45

lorentzenchr added 2 commits March 25, 2023 19:55

CLN meaningful sample weights

37964b2

CLN use global_random_seed

8b3d171

lorentzenchr changed the title ~~[MRG] Add tests that sample weights act consistently~~ TST Add tests for LinearRegression that sample weights act consistently Mar 26, 2023

lorentzenchr added 3 commits March 26, 2023 12:18

CLN sync with test_enet_sample_weight_consistency

80fcc4d

TST add TODO and skip test

dd4f742

CLN typo

8e2b301

jjerphan reviewed Mar 27, 2023

View reviewed changes

lorentzenchr added 2 commits April 11, 2023 21:38

Merge branch 'main' into sw_tests

3b2d83c

CLN add gh-issue to FIXME

80513c9

lorentzenchr mentioned this pull request Apr 13, 2023

LinearRegression with zero sample_weights is not the same as excluding those rows #26164

Closed

lorentzenchr added this to the 1.3 milestone Apr 30, 2023

jjerphan approved these changes May 1, 2023

View reviewed changes

CLN Apply suggestions from code review

59bae79

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

jeremiedbb approved these changes May 26, 2023

View reviewed changes

Merge branch 'main' into sw_tests

89a75f0

jeremiedbb enabled auto-merge (squash) May 26, 2023 14:58

jeremiedbb merged commit 42d2359 into scikit-learn:main May 26, 2023

lorentzenchr deleted the sw_tests branch June 2, 2023 07:37

REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023

TST Add tests for LinearRegression that sample weights act consistent…

d3e017b

…ly (scikit-learn#15554) Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

Uh oh!

Conversation

lorentzenchr commented Nov 6, 2019 • edited by jjerphan Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

rth commented Nov 7, 2019

Uh oh!

lorentzenchr commented May 10, 2020

Uh oh!

rth commented May 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lorentzenchr commented May 10, 2020

Uh oh!

rth commented May 10, 2020

Uh oh!

rth commented Jun 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lorentzenchr commented Jun 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lorentzenchr commented Oct 28, 2022

Uh oh!

jjerphan commented Mar 6, 2023

Uh oh!

lorentzenchr commented Mar 26, 2023

Uh oh!

jjerphan left a comment

Choose a reason for hiding this comment

Uh oh!

jjerphan Mar 27, 2023

Choose a reason for hiding this comment

Uh oh!

lorentzenchr Apr 5, 2023

Choose a reason for hiding this comment

Uh oh!

lorentzenchr Apr 12, 2023

Choose a reason for hiding this comment

Uh oh!

lorentzenchr commented Apr 14, 2023

Uh oh!

lorentzenchr commented Apr 30, 2023

Uh oh!

jjerphan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lorentzenchr commented May 1, 2023

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

lorentzenchr commented Nov 6, 2019 •

edited by jjerphan

Loading

rth commented May 10, 2020 •

edited

Loading

rth commented Jun 4, 2020 •

edited

Loading

lorentzenchr commented Jun 27, 2021 •

edited

Loading