-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
Description
LogisticRegresssion with the lbfgs solver terminates early, even when tol is decreased and max_iter has not been reached.
Code to Reproduce
We fit random data twice, changing only the order of the examples. Ideally, example order should not matter; the fit coefficients should be the same either way. I produced the results below with this code in colab.
from sklearn.linear_model import LogisticRegression
import numpy as np
n_features = 1000
n_examples = 1500
np.random.seed(0)
x = np.random.random((n_examples, n_features))
y = np.random.randint(2, size=n_examples)
max_iter=1000
solver = 'lbfgs'
for tol in [1e-2, 1e-3, 1e-4, 1e-5]:
np.random.seed(0)
lr1 = LogisticRegression(solver=solver, max_iter=max_iter, tol=tol).fit(x, y)
np.random.seed(0)
lr2 = LogisticRegression(solver=solver, max_iter=max_iter, tol=tol).fit(x[::-1], y[::-1])
print(f'tol={tol}')
print(f' Optimizer iterations, forward order: {lr1.n_iter_[0]}, reverse order: {lr2.n_iter_[0]}.')
print(f' Mean absolute diff in coefficients: {np.abs(lr1.coef_ - lr2.coef_).mean()}')Expected Results
As tol is reduced, the difference between coefficients continues to decrease provided that max_iter is not being hit. When solver is changed to 'newton-cg', we get the expected behavior:
tol=0.01
Optimizer iterations, forward order: 12, reverse order: 11.
Mean absolute diff in coefficients: 0.0004846833304941047
tol=0.001
Optimizer iterations, forward order: 15, reverse order: 14.
Mean absolute diff in coefficients: 5.4776672871601846e-05
tol=0.0001
Optimizer iterations, forward order: 19, reverse order: 16.
Mean absolute diff in coefficients: 1.6047945654930538e-06
tol=1e-05
Optimizer iterations, forward order: 19, reverse order: 17.
Mean absolute diff in coefficients: 2.76826465093659e-07
Actual Results
As tol is reduced, the optimizer does not take more steps despite not having converged:
tol=0.01
Optimizer iterations, forward order: 362, reverse order: 376.
Mean absolute diff in coefficients: 0.0007590864459748883
tol=0.001
Optimizer iterations, forward order: 373, reverse order: 401.
Mean absolute diff in coefficients: 0.0006877678611572595
tol=0.0001
Optimizer iterations, forward order: 373, reverse order: 401.
Mean absolute diff in coefficients: 0.0006877678611572595
tol=1e-05
Optimizer iterations, forward order: 373, reverse order: 401.
Mean absolute diff in coefficients: 0.0006877678611572595
Versions
Output of sklearn.show_versions():
System:
python: 3.6.9 (default, Jul 17 2020, 12:50:27) [GCC 8.4.0]
executable: /usr/bin/python3
machine: Linux-4.19.104+-x86_64-with-Ubuntu-18.04-bionic
Python dependencies:
pip: 19.3.1
setuptools: 49.2.0
sklearn: 0.23.1
numpy: 1.18.5
scipy: 1.4.1
Cython: 0.29.21
pandas: 1.0.5
matplotlib: 3.2.2
joblib: 0.16.0
threadpoolctl: 2.1.0
Built with OpenMP: True
Diagnosis
I'm pretty sure the issue is in the call to scipy.optimize.minimize at this line in linear_model/_logistic.py. The value of tol is passed to minimize as gtol, but ftol and eps are left at their default values. In the example above, I think the optimizer is hitting the ftol termination condition. Possible solutions:
- Scale down
ftolandepsby some multiple oftol. - Scale down
epsby some multiple oftoland setftolto zero. - Allow the user of LogisticRegression to control
ftolandepsthrough additional kwargs.