Skip to content

LogisticRegressionCV does not handle sample weights as expected when using liblinear solver #29416

@snath-xoc

Description

@snath-xoc

Note: this is a special case of a the wider problem described in:

Describe the bug

_log_reg_scoring_path used within LogisticRegressionCV with liblinear solver not returning the same coefficients when weighting samples using sample_weight versus when repeating samples based on weights.

NOTE: L801 in _log_reg_scoring_path does not pass sample_weight into scorer when scorer is not specified, needs fixing.

Steps/Code to Reproduce

import numpy as np
from sklearn.datasets import make_classification
from sklearn.metrics import get_scorer
from sklearn.linear_model import LogisticRegression, LogisticRegressionCV
from sklearn.model_selection import LeaveOneGroupOut


import sklearn
sklearn.set_config(enable_metadata_routing=True)

rng = np.random.RandomState(0)

X, y = make_classification(
        n_samples=300000, n_features=8,
            random_state=10,
            n_informative=4,
            n_classes=2,

)
        

n_samples = X.shape[0] // 3
sw = np.ones_like(y)

# We weight the first fold n times more.
sw[:n_samples] = rng.randint(0, 5, size=n_samples)
groups_sw = np.r_[
    np.full(n_samples, 0), np.full(n_samples, 1), np.full(n_samples, 2)
]
splits_weighted = list(LeaveOneGroupOut().split(X, groups=groups_sw))

# We repeat the first fold n times and provide splits ourselves and overwrite
## initial resampled data
X_resampled_by_weights = np.repeat(X, sw.astype(int), axis=0)

##Need to know number of repitions made in total
n_reps = X_resampled_by_weights.shape[0] - X.shape[0]

y_resampled_by_weights = np.repeat(y, sw.astype(int), axis=0)
groups = np.r_[
    np.full(n_reps + n_samples, 0), np.full(n_samples, 1), np.full(n_samples, 2)
]
splits_repeated = list(LeaveOneGroupOut().split(X_resampled_by_weights, groups=groups))

est_weighted = LogisticRegression(solver = "liblinear").fit(X,y,sample_weight=sw)
est_repeated = LogisticRegression(solver = "liblinear").fit(X_resampled_by_weights,y_resampled_by_weights)

np.testing.assert_allclose(est_weighted.coef_, est_repeated.coef_)


est_weighted = LogisticRegressionCV(cv=splits_weighted, solver = "liblinear").fit(X,y,sample_weight=sw)

est_repeated = LogisticRegressionCV(cv=splits_repeated, solver = "liblinear").fit(X_resampled_by_weights,y_resampled_by_weights)

np.testing.assert_allclose(est_weighted.coef_, est_repeated.coef_)

Expected Results

No error is thrown

Actual Results

AssertionError: 
Not equal to tolerance rtol=1e-07, atol=0

Mismatched elements: 8 / 8 (100%)
Max absolute difference among violations: 0.02352997
Max relative difference among violations: 10.49415031
 ACTUAL: array([[ 5.580057e-01,  1.455297e-01,  1.117538e-02,  9.940221e-04,
         2.078733e-05, -2.118241e-01, -2.361904e-01, -6.555003e-01]])
 DESIRED: array([[ 5.757953e-01,  1.541149e-01,  9.722671e-04,  1.094184e-03,
         1.143567e-04, -2.027509e-01, -2.405034e-01, -6.790303e-01]])

Versions

System:
    python: 3.12.4 | packaged by conda-forge | (main, Jun 17 2024, 10:13:44) [Clang 16.0.6 ]
executable: /Users/shrutinath/micromamba/envs/scikit-learn/bin/python
   machine: macOS-14.3-arm64-arm-64bit

Python dependencies:
      sklearn: 1.6.dev0
          pip: 24.0
   setuptools: 70.1.1
        numpy: 2.0.0
        scipy: 1.14.0
       Cython: 3.0.10
       pandas: None
   matplotlib: 3.9.0
       joblib: 1.4.2
threadpoolctl: 3.5.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libopenblas
       filepath: /Users/shrutinath/micromamba/envs/scikit-learn/lib/libopenblas.0.dylib
        version: 0.3.27
threading_layer: openmp
   architecture: VORTEX

       user_api: openmp
   internal_api: openmp
    num_threads: 8
         prefix: libomp
       filepath: /Users/shrutinath/micromamba/envs/scikit-learn/lib/libomp.dylib
        version: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugMetadata Routingall issues related to metadata routing, slep006, sample props

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions