Fix sample weight handling in scoring _log_reg_scoring_path by snath-xoc · Pull Request #29419 · scikit-learn/scikit-learn

snath-xoc · 2024-07-05T10:40:22Z

Reference Issues/PRs

Fixes #29416

What does this implement/fix? Explain your changes.

Added sample weighting for test set into default calculation of scores within _log_reg_scoring_path

TO DO:

so far works with max_iter 10_000 and tol 1e-8, expected to work with tol 1e-12 but this fails on lbfgs solver
Switch changelog to v1.6.rst

github-actions · 2024-07-05T10:41:44Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: cb2f28b. Link to the linter CI: here}

ogrisel · 2024-07-05T12:18:13Z

Thanks for the PR. Could you please add a non regression test based on the reproducer provided in #29416 and make it run on all solvers with a small value of tol such as tol=1e-12? It's possible that strict equivalence might be hard to reach for stochastic solvers such as SAG/SAGA but I am not sure.

…ights

snath-xoc · 2024-07-09T21:56:48Z

Note: in the test_logistic_regression_sample_weight sag and saga are left out as they systematically fail

ogrisel · 2024-08-07T15:39:01Z

@snath-xoc could you please push a commit that triggers running the updated test in this PR for all admissible values of global_random_seed? Instructions are available in this information issue: #28959.

EDIT: before testing on the CI, you should test locally with commands like the following:

SKLEARN_TESTS_GLOBAL_RANDOM_SEED="all" pytest -k test_logistic_regression_sample_weights sklearn/linear_model/tests/test_logistic.py -vlx

at the moment, the test fails for many seeds but this can be fixed as explained in the review comments below.

ogrisel

Here is some feedback to help move this PR forward. Please don't forget to add an entry to the changelog.

sklearn/linear_model/tests/test_logistic.py

ogrisel · 2024-08-07T16:11:30Z

I find this test too long to follow. You might want to decouple the part that tests equivalence between integer sample weights and their repeated counter part from the part of the test that checks equivalence between class weights and sample weights into two independent tests.

doc/whats_new/v1.5.rst

ogrisel · 2024-08-09T14:06:20Z

To actually trigger [all random seeds] tests you need to include the list of test function names after the [all random seeds] flag in the commit message as explained in #28959.

test_logistic_regression_sample_weights

test_sample_and_class_weight_equivalence_liblinear test_logistic_regression_sample_weights test_logistic_regression_solver_class_weights

OmarManzoor

Otherwise LGTM. Thanks @snath-xoc

doc/whats_new/v1.5.rst

doc/whats_new/v1.6.rst

ogrisel

Some more minor feedback on top of Omar's but otherwise, LGTM!

sklearn/linear_model/tests/test_logistic.py

doc/whats_new/v1.5.rst

doc/whats_new/v1.6.rst

sklearn/linear_model/tests/test_logistic.py

ogrisel · 2024-09-05T09:20:03Z

Note for later: the LogisticRegressionCV class has the following tag to skip the common test:

    def _more_tags(self):
        return {
            "_xfail_checks": {
                "check_sample_weights_invariance": (
                    "zero sample_weight is not equivalent to removing samples"
                ),
            }
        }

However check_sample_weights_invariance currently does not work well with kind="zeros" in the presence of a cv constructor parameter. This common test should be updated to drop samples without moving the CV boundaries for the remaining training points.

This should better be done in a dedicated PR though.

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Omar Salman <omar.salman2007@gmail.com>

ogrisel · 2024-09-06T12:18:01Z

Merged! Thanks very much @snath-xoc!

add sample weight to default scoring _log_reg_scoring_path

93e46a8

github-actions bot added the module:linear_model label Jul 5, 2024

ogrisel mentioned this pull request Jul 5, 2024

LogisticRegressionCV does not handle sample weights as expected when using liblinear solver #29416

Closed

merged test for sample weight into test_logistic_regression_sample_we…

d177425

…ights

ogrisel reviewed Aug 7, 2024

View reviewed changes

snath-xoc added 4 commits August 8, 2024 18:35

modified test_logistic [all random seeds]

5daf9cf

modified changelog [all random seeds]

a601b15

increase max iter in l1l2 liblinear tests [all random seeds]

4e15ba4

fixed indent error [all random seeds]

f8221a6

ogrisel reviewed Aug 9, 2024

View reviewed changes

doc/whats_new/v1.5.rst Outdated Show resolved Hide resolved

ogrisel reviewed Aug 9, 2024

View reviewed changes

doc/whats_new/v1.5.rst Outdated Show resolved Hide resolved

snath-xoc and others added 10 commits August 21, 2024 15:53

[all random seeds]

e2376a0

test_logistic_regression_sample_weights

[all random seeds]

4a2cf8a

test_logistic_regression_sample_weights

fix tol for sag and saga [all random seeds]

503bdaa

test_logistic_regression_sample_weights

fix max_iter for sag and saga [all random seeds]

431735f

test_logistic_regression_sample_weights

Merge branch 'main' into logisticregressioncv_sample_weight

f4867a7

[all random seeds]

6ebe58e

test_logistic_regression_sample_weights

[all random seeds]

6e1fe1b

test_sample_and_class_weight_equivalence_liblinear test_logistic_regression_sample_weights test_logistic_regression_solver_class_weights

[all random seeds]

46a857a

test_sample_and_class_weight_equivalence_liblinear test_logistic_regression_sample_weights test_logistic_regression_solver_class_weights

[all random seeds]

ac76991

test_sample_and_class_weight_equivalence_liblinear test_logistic_regression_sample_weights test_logistic_regression_solver_class_weights

[all random seeds]

7c102b0

test_sample_and_class_weight_equivalence_liblinear test_logistic_regression_sample_weights test_logistic_regression_solver_class_weights

snath-xoc marked this pull request as ready for review August 26, 2024 09:16

OmarManzoor approved these changes Sep 4, 2024

View reviewed changes

doc/whats_new/v1.5.rst Outdated Show resolved Hide resolved

doc/whats_new/v1.6.rst Outdated Show resolved Hide resolved

ogrisel approved these changes Sep 4, 2024

View reviewed changes

ogrisel mentioned this pull request Sep 5, 2024

RFC Sample weight invariance properties #15657

Open

This was referenced Sep 5, 2024

Fix elasticnect cv sample weight #29442

Merged

List of estimators with known incorrect handling of sample_weight #16298

Open

Apply suggestions from code review

4558805

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Omar Salman <omar.salman2007@gmail.com>

ogrisel enabled auto-merge (squash) September 6, 2024 08:49

Fix linting problem

cb2f28b

ogrisel merged commit 7baa11e into scikit-learn:main Sep 6, 2024

ogrisel changed the title ~~add sample weight to default scoring _log_reg_scoring_path~~ Fix sample weight handling in scoring _log_reg_scoring_path Sep 6, 2024

ogrisel mentioned this pull request Sep 6, 2024

Make check_sample_weights_invariance cv-aware #29796

Merged

Uh oh!

Conversation

snath-xoc commented Jul 5, 2024 • edited by ogrisel Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

github-actions bot commented Jul 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

ogrisel commented Jul 5, 2024

Uh oh!

snath-xoc commented Jul 9, 2024

Uh oh!

ogrisel commented Aug 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ogrisel commented Aug 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ogrisel commented Aug 9, 2024

Uh oh!

OmarManzoor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ogrisel commented Sep 5, 2024

Uh oh!

ogrisel commented Sep 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

snath-xoc commented Jul 5, 2024 •

edited by ogrisel

Loading

github-actions bot commented Jul 5, 2024 •

edited

Loading

ogrisel commented Aug 7, 2024 •

edited

Loading

ogrisel commented Aug 7, 2024 •

edited

Loading