TST tight tests for GLMs by lorentzenchr · Pull Request #23619 · scikit-learn/scikit-learn

lorentzenchr · 2022-06-14T16:47:15Z

Reference Issues/PRs

This is similar to #22910, but for GLMs instead of Ridge.

What does this implement/fix? Explain your changes.

Very tight test suite for penalized and unpenalized GLMs.

It also modifies the lbfgs options for better convergence (such that it passes the new tests). I think this does not need a changelog entry, but it modifies the model (it may need more iterations, but produce more precise coefficients).

Any other comments?

Prerequesite for #23314.

Locally, all tests pass with SKLEARN_TESTS_GLOBAL_RANDOM_SEED="all" pytest -x -Werror sklearn/linear_model/_glm/tests/test_glm.py.

ogrisel · 2022-06-15T08:17:54Z

I think we should document the change in the internal LBFGS configuration to document the improved convergence and maybe suggest the user to adapt the tol parameter accordingly if they do not require tight convergence.

sklearn/linear_model/_glm/glm.py

lorentzenchr · 2022-06-15T09:57:21Z

I think we should document the change in the internal LBFGS configuration to document the improved convergence

I'll add a changelog entry.

maybe suggest the user to adapt the tol parameter accordingly if they do not require tight convergence.

No objections in general. I just don't think it's necessary as our default tol=1e-4 is quite large.

ogrisel

Overall I like the idea and I am ok to change the default parameters of the lbfgs solver to be stricter to allow more precise convergence when setting low tol values.

My main concern is the expectations we put behind the use of xfail in our tests. I think lbfgs has no reason to converge to the minimum norm solution for unpenalized problems. We should therefore not assume that this is a "bug" of lbfgs. See the inline comments for more details.

sklearn/linear_model/_glm/tests/test_glm.py

sklearn/linear_model/_glm/glm.py

sklearn/linear_model/_glm/tests/test_glm.py

ogrisel

Assuming CI is green, this LGTM.

Maybe @agramfort would be interested in giving this a second review?

agramfort

just 2 nitpicks

thx @lorentzenchr

sklearn/linear_model/_glm/tests/test_glm.py

jjerphan

Thank you, @lorentzenchr.

A first pass before getting back on solvers.

sklearn/linear_model/_glm/glm.py

sklearn/linear_model/_glm/tests/test_glm.py

sklearn/linear_model/tests/test_ridge.py

lorentzenchr · 2022-06-18T12:16:50Z

I could make the CI work in 3458c39. It passes, but on every setting/system several warnings are triggered. Locally, I do not get any error (and I ran it with -Werror).

test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start

This reverts commit 99f4cf9.

lorentzenchr · 2022-06-18T13:40:28Z

In 79ec862 I increased the maximum number of linesearch steps in LBFGS from 40 to 50. This eliminated most convergence warnings:

sklearn.exceptions.ConvergenceWarning: lbfgs failed to converge (status=2):
ABNORMAL_TERMINATION_IN_LNSRCH.
               
Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html

The following plattforms still have warnings:

Linux ubuntu_atlas

Details

System:
    python: 3.8.10 (default, Mar 15 2022, 12:22:08)  [GCC 9.4.0]
executable: /home/vsts/work/1/s/testvenv/bin/python
   machine: Linux-5.13.0-1029-azure-x86_64-with-glibc2.29

Python dependencies:
      sklearn: 1.2.dev0
          pip: 20.0.2
   setuptools: 44.0.0
        numpy: 1.17.4
        scipy: 1.3.3
       Cython: 0.29.30
       pandas: None
   matplotlib: 3.1.2
       joblib: 1.0.0
threadpoolctl: 2.0.0

Windows py38_pip_openblas_32bit

Details

System:
    python: 3.8.10 (tags/v3.8.10:3d8993a, May  3 2021, 11:34:34) [MSC v.1928 32 bit (Intel)]
executable: D:\a\1\s\testvenv\Scripts\python.exe
   machine: Windows-10-10.0.20348-SP0

Python dependencies:
      sklearn: 1.2.dev0
          pip: 21.1.1
   setuptools: 56.0.0
        numpy: 1.22.4
        scipy: 1.8.1
       Cython: 0.29.30
       pandas: None
   matplotlib: None
       joblib: 1.1.0
threadpoolctl: 3.1.0

Summary: I think this is good to go.

ogrisel · 2022-06-20T08:47:22Z

@jeremiedbb we have a problem with the "[all random seeds]" CI config as can be seen in 4fd1d9b which triggers:

INTERNALERROR> OSError: cannot send to <Channel id=1 closed>

it might the problem described in this recent-ish comment on a very old gist from 2015:

https://gist.github.com/khamidou/6b7e8702f8fc0e8dd251?permalink_comment_id=4080170

I encountered this issue in the context of working with Python and non-Python test files. When using a metafunc to parametrize values in Python tests for a specific test run, this error is thrown due to the presence of non-Python test files.
The workaround is to exclude the non-Python test files by using pytest_ignore_collect.

I am not sure what the non-Python test files could be in our case. Let me try to run "[all random seeds]" without pytest-xdist on this PR.

The original problem from 2015 related to execnet 1.2. However nowadays we use execnet 1.9 so the original workaround to pin execnet to 1.1 is probably out of date.

test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start

ogrisel · 2022-06-20T09:04:40Z

The INTERNALERROR we observed with pytest-xdist seems to go away when running the "all random seeds" tests sequentially (but then everything is very slow). Maybe the problem is related to the very large number of tests we generate when we enable "all random seeds"?

collected 114068 items / 104125 deselected / 2 skipped / 9943 selected

EDIT: the sequential [all random seeds] run was successful. I will push another commit to re-enable pytest-xdist (without running all random seeds).

ogrisel · 2022-06-20T15:53:32Z

I have found the source of the lack of convergence to the minimum norm solution when fit_intercept=True: it's because we do not zero init the intercept in this case.

I am working on a PR to this PR that does not try to use the smart intercept init and simplify the tests accordingly. I think converging to the minimum norm solution is a nice inductive bias we should keep if possible.

Edit: done in lorentzenchr#7

ogrisel · 2022-06-24T13:06:10Z

Since lorentzenchr#7 will require more work, I think we can already merge this PR to main as it is already a net benefit and we can investigate the issue of intercept init separately.

This will decrease the diff size for the follow-up PRs on the new newton solvers.

into glm_tests

lorentzenchr · 2022-06-27T18:06:18Z

All green. Maybe @jjerphan might have a short look and hit the merge button? This is 99% tests and the one actual change is in the options of lbfgs which lets it pass the tests.

jjerphan

LGTM up to minor suggestions. Thank you, @lorentzenchr.

Looking forward to further work on the suite of solvers.

sklearn/linear_model/_glm/tests/test_glm.py

doc/whats_new/v1.2.rst

jjerphan · 2022-06-29T14:12:43Z

Still LGTM. I will let @ogrisel merge after #23619 (comment) is resolved.

test_glm_regression_hstacked_X

ogrisel · 2022-06-29T17:19:49Z

All green, the xfail was no longer needed. Let's merge.

lorentzenchr · 2022-06-29T20:34:17Z

@ogrisel Thanks for the polishing at the end.

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

lorentzenchr added 4 commits June 14, 2022 00:01

TST add rigoros test for GLMs

aa81fb5

TST improve test_warm_start

cd06ba7

ENH improve lbfgs options for better convergence

a27c7f9

CLN fix test_warm_start

a5c1fc0

github-actions bot added the module:linear_model label Jun 14, 2022

lorentzenchr changed the title ~~Glm tight tests for GLMs~~ TST tight tests for GLMs Jun 14, 2022

lorentzenchr mentioned this pull request Jun 14, 2022

FEA add Cholesky based Newton solver to GLMs #23314

Closed

lorentzenchr added the No Changelog Needed label Jun 14, 2022

ogrisel reviewed Jun 15, 2022

View reviewed changes

sklearn/linear_model/_glm/glm.py Outdated Show resolved Hide resolved

lorentzenchr removed the No Changelog Needed label Jun 15, 2022

ogrisel reviewed Jun 15, 2022

View reviewed changes

sklearn/linear_model/_glm/glm.py Outdated Show resolved Hide resolved

lorentzenchr added 5 commits June 15, 2022 21:40

TST fix assert singular values in datasets

9b8519d

CLN address most review comments

68947b6

ENH enable more vebosity levels for lbfgs

06a0b79

DOC add whatsnew

4d245cf

Merge branch 'main' into glm_tests

f6bee64

ogrisel reviewed Jun 15, 2022

View reviewed changes

sklearn/linear_model/_glm/tests/test_glm.py Outdated Show resolved Hide resolved

lorentzenchr added 2 commits June 16, 2022 19:23

CLN remove xfail and clean a bit

382d177

CLN docstring about minimum norm

25fe6e1

ogrisel reviewed Jun 16, 2022

View reviewed changes

sklearn/linear_model/_glm/tests/test_glm.py Show resolved Hide resolved

More informative repr for the glm_dataset fixture cases

4c4582d

ogrisel approved these changes Jun 16, 2022

View reviewed changes

Forgot to run black

2065a9e

jjerphan self-requested a review June 16, 2022 19:13

agramfort reviewed Jun 16, 2022

View reviewed changes

sklearn/linear_model/_glm/tests/test_glm.py Show resolved Hide resolved

sklearn/linear_model/_glm/tests/test_glm.py Outdated Show resolved Hide resolved

jjerphan reviewed Jun 16, 2022

View reviewed changes

CLN remove unnecessary filterwarnings

5aaaf21

lorentzenchr added 4 commits June 18, 2022 14:20

Trigger with -Werror [all random seeds]

99f4cf9

test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start

ENH increase maxls to 50

79ec862

[all random seeds]

904e960

test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start

Revert "Trigger with -Werror [all random seeds]"

4fd1d9b

This reverts commit 99f4cf9.

trigger normal CI

e0cde36

DEBUG disable pytest-xdist and run [all random seeds]

9bb4ae2

test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start

Reenable pytest-xdist on CI

1039a6e

ogrisel mentioned this pull request Jun 20, 2022

Zero init for the intercept of GLMs lorentzenchr/scikit-learn#7

Closed

ogrisel and others added 3 commits June 24, 2022 15:07

Merge branch 'main' into glm_tests

0dd1a95

review comment from other PR ;-)

386d469

Merge branch 'glm_tests' of https://github.com/lorentzenchr/scikit-learn

2a81628

into glm_tests

jjerphan approved these changes Jun 29, 2022

View reviewed changes

CLN address some review comments

20f15d7

jjerphan reviewed Jun 29, 2022

View reviewed changes

doc/whats_new/v1.2.rst Outdated Show resolved Hide resolved

lorentzenchr added 2 commits June 29, 2022 15:30

CLN add comment on solver parameter

28fdbab

DOC align whatsnew entries

79868af

Remove xfail from test_glm_regression_hstacked_X [all random seeds]

03a1089

test_glm_regression_hstacked_X

ogrisel merged commit 9d863ab into scikit-learn:main Jun 29, 2022

ogrisel added a commit to ogrisel/scikit-learn that referenced this pull request Jul 11, 2022

TST tight tests for GLMs (scikit-learn#23619)

0d5b6be

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

lorentzenchr deleted the glm_tests branch October 25, 2022 14:04

Uh oh!

Conversation

lorentzenchr commented Jun 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

ogrisel commented Jun 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

lorentzenchr commented Jun 15, 2022

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

agramfort left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jjerphan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lorentzenchr commented Jun 18, 2022

Uh oh!

lorentzenchr commented Jun 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Jun 20, 2022

Uh oh!

ogrisel commented Jun 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Jun 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Jun 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lorentzenchr commented Jun 27, 2022

Uh oh!

jjerphan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lorentzenchr commented Jun 14, 2022 •

edited

Loading

ogrisel commented Jun 15, 2022 •

edited

Loading

lorentzenchr commented Jun 18, 2022 •

edited

Loading

ogrisel commented Jun 20, 2022 •

edited

Loading

ogrisel commented Jun 20, 2022 •

edited

Loading

ogrisel commented Jun 24, 2022 •

edited

Loading