TST Add tests for LinearRegression that sample weights act consistently#15554
TST Add tests for LinearRegression that sample weights act consistently#15554jeremiedbb merged 20 commits intoscikit-learn:mainfrom
Conversation
One was actually proposed in #15015 but is waiting for all estimators to pass. Not sure it's the best strategy, maybe marking it as XFAIL would indeed be better. |
|
This PR adds 4 tests to
I could adapt this PR to only do 3 and 4 for |
|
Thanks @lorentzenchr ! I have merged #16507 which should make some of this easier. |
|
Sorry, @lorentzenchr I reverted #16507 in the end, there were incompatible changes to check_estimator meanwhile. New PR is in #17176.
No but it would definitely makes sense to add them there as well (for estimators that support it). |
That PR is finally merged. There is also #17441 which would allow running common tests with non default parameters (i.e. other values of fit_intercept, solver, etc). For instance, Edit: opened #17444 about |
|
Mmh, those tests reveal some shortcomings in |
|
#19616 added some tests, I'll rebase and clean up. |
e3b9ef8 to
2fc0657
Compare
|
Hi @lorentzenchr, what is the state of this PR? Can one pursue it? 🙂 |
|
It took me longer than anticipated. The tests show some shortcomings, uncovered in dd4f742. |
jjerphan
left a comment
There was a problem hiding this comment.
Thank @lorentzenchr. To you, what is the best approach to resolve the current assertion failures?
| # TODO: This often fails, e.g. when calling | ||
| # SKLEARN_TESTS_GLOBAL_RANDOM_SEED="all" pytest \ | ||
| # sklearn/linear_model/tests/test_base.py\ | ||
| # ::test_linear_regression_sample_weight_consistency | ||
| pass |
There was a problem hiding this comment.
For now, can you mark this test as xfail in this case and explain which assertion is not verified? Is this sensitive to the scaling of y[-5:]?
There was a problem hiding this comment.
This fails at random, so xfail is not appropriate IMO.
|
With #26164 opened, this is clean and ready from my side. |
|
@jjerphan Friendly ping. CI 🟢 This is "only" a test improvement, no functionality is changed, so a review with less scrutiny is acceptable, I guess. |
jjerphan
left a comment
There was a problem hiding this comment.
LGTM. Thank you for your patience, @lorentzenchr.
Ideally, we could combine those tests into a single parametrized one, for now, the corner cases and the GLM implementations make it hardly doable.
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
I thought about that. But as you say, there are too many different corner cases. Can we merge as is with only one review as only tests are added? Or maybe @rth wants to bring it over the line and press one or two buttons🤞 |
jeremiedbb
left a comment
There was a problem hiding this comment.
Good to have this more extensively tested. I'd like to have such kind of test for more estimators because the common test just check for 1 combination and we often discover bugs years later for the other combinations. Thanks @lorentzenchr, LGTM !
…ly (scikit-learn#15554) Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Reference Issues/PRs
Relates to #21504.
Make bugs in #15438 visible by introducing tests (with xfail).
What does this implement/fix? Explain your changes.
So far, this PR only adds tests for
RidgeandLinearRegression.Branch is based on #15530.
Any other comments?
Would be nice to have the generalizable part of those tests in
sklearn/utils/estimator_checks.py:sample_weight = 0equivalent of removing that sample (data row). Edit: Ongoing effort in add common test that zero sample weight means samples are ignored #15015