FEA add (single) Cholesky Newton solver to GLMs#24637
FEA add (single) Cholesky Newton solver to GLMs#24637jjerphan merged 115 commits intoscikit-learn:mainfrom
Conversation
|
The |
|
A unrelated question: Why don't we link the codecov in the CI anymore? |
Great. So let's hope, CI gets 🟢 . |
|
All 💚! I leave @ogrisel merge if everything looks good to him. |
|
wait a minute |
lorentzenchr
left a comment
There was a problem hiding this comment.
One open question to go...
sklearn/linear_model/_glm/glm.py
Outdated
| # 2.2 Deal with relative gradient differences around machine precision. | ||
| tiny_grad = sum_abs_grad_old * eps | ||
| abs_grad_improvement = np.abs(sum_abs_grad - sum_abs_grad_old) | ||
| check = abs_grad_improvement <= tiny_grad | ||
| if is_verbose: | ||
| print( | ||
| " check |sum(|gradient|) - sum(|gradient_old|)| <= eps * " | ||
| "sum(|gradient_old|):" | ||
| f" {abs_grad_improvement} <= {tiny_grad} {check}" | ||
| ) | ||
| if check: | ||
| break | ||
| # 2.3 This is really the last resort. | ||
| # Check that sum(|gradient_{i-1}|) < sum(|gradient_{i-2}|) | ||
| # = has_improved_sum_abs_grad_previous | ||
| # If now sum(|gradient_{i}|) >= sum(|gradient_{i-1}|), this iteration | ||
| # made things worse and we should have stopped at i-1. |
There was a problem hiding this comment.
I'm undecided whether we need 2.2 and 2.3 in the line search. We could just fail and resort to lbfgs.
I added those 2 criteria when I was trying fallback strategies (e.g. simple gradient steps) for situations like indefinite hessian, negative pointwise hessian values, collinear features.
There was a problem hiding this comment.
Indeed. I think we can first validate this solver by merging this PR and do this study in a follow-up PR. What do you think?
| # 2.2 Deal with relative gradient differences around machine precision. | |
| tiny_grad = sum_abs_grad_old * eps | |
| abs_grad_improvement = np.abs(sum_abs_grad - sum_abs_grad_old) | |
| check = abs_grad_improvement <= tiny_grad | |
| if is_verbose: | |
| print( | |
| " check |sum(|gradient|) - sum(|gradient_old|)| <= eps * " | |
| "sum(|gradient_old|):" | |
| f" {abs_grad_improvement} <= {tiny_grad} {check}" | |
| ) | |
| if check: | |
| break | |
| # 2.3 This is really the last resort. | |
| # Check that sum(|gradient_{i-1}|) < sum(|gradient_{i-2}|) | |
| # = has_improved_sum_abs_grad_previous | |
| # If now sum(|gradient_{i}|) >= sum(|gradient_{i-1}|), this iteration | |
| # made things worse and we should have stopped at i-1. | |
| # TODO: study removing 2.2 and 2.3 for instead resorting to lbfgs-b. | |
| # 2.2 Deal with relative gradient differences around machine precision. | |
| tiny_grad = sum_abs_grad_old * eps | |
| abs_grad_improvement = np.abs(sum_abs_grad - sum_abs_grad_old) | |
| check = abs_grad_improvement <= tiny_grad | |
| if is_verbose: | |
| print( | |
| " check |sum(|gradient|) - sum(|gradient_old|)| <= eps * " | |
| "sum(|gradient_old|):" | |
| f" {abs_grad_improvement} <= {tiny_grad} {check}" | |
| ) | |
| if check: | |
| break | |
| # 2.3 This is really the last resort. | |
| # Check that sum(|gradient_{i-1}|) < sum(|gradient_{i-2}|) | |
| # = has_improved_sum_abs_grad_previous | |
| # If now sum(|gradient_{i}|) >= sum(|gradient_{i-1}|), this iteration | |
| # made things worse and we should have stopped at i-1. |
There was a problem hiding this comment.
My main problem is that I can't find the settings anymore that led me to implement those conditions. I could also remove now and we can reinsert when problems arrive. Those lines are still present in #23314.
There was a problem hiding this comment.
Can you please try to remove them and re-run the 2 PR examples and the benchmarks of this comment?
If the benchmark curve still looks the same with no new no-convergence warning, then I am fine to merge the PR without those checks and consider re-adding them if we observe the ConvergenceWarning on a real dataset.
There was a problem hiding this comment.
Also I still think we could extract the line search code into a standalone function with dedicated unit tests. That would probably help trigger those conditions on arbitrary optimization problems (such as Multidimensional generalisations of the Rosenbrock function).
Can be done in a follow-up PR if what I said in the previous comment does not reveal any problem by itself.
sklearn/linear_model/_glm/glm.py
Outdated
| # 2.2 Deal with relative gradient differences around machine precision. | ||
| tiny_grad = sum_abs_grad_old * eps | ||
| abs_grad_improvement = np.abs(sum_abs_grad - sum_abs_grad_old) | ||
| check = abs_grad_improvement <= tiny_grad | ||
| if is_verbose: | ||
| print( | ||
| " check |sum(|gradient|) - sum(|gradient_old|)| <= eps * " | ||
| "sum(|gradient_old|):" | ||
| f" {abs_grad_improvement} <= {tiny_grad} {check}" | ||
| ) | ||
| if check: | ||
| break | ||
| # 2.3 This is really the last resort. | ||
| # Check that sum(|gradient_{i-1}|) < sum(|gradient_{i-2}|) | ||
| # = has_improved_sum_abs_grad_previous | ||
| # If now sum(|gradient_{i}|) >= sum(|gradient_{i-1}|), this iteration | ||
| # made things worse and we should have stopped at i-1. |
There was a problem hiding this comment.
Indeed. I think we can first validate this solver by merging this PR and do this study in a follow-up PR. What do you think?
| # 2.2 Deal with relative gradient differences around machine precision. | |
| tiny_grad = sum_abs_grad_old * eps | |
| abs_grad_improvement = np.abs(sum_abs_grad - sum_abs_grad_old) | |
| check = abs_grad_improvement <= tiny_grad | |
| if is_verbose: | |
| print( | |
| " check |sum(|gradient|) - sum(|gradient_old|)| <= eps * " | |
| "sum(|gradient_old|):" | |
| f" {abs_grad_improvement} <= {tiny_grad} {check}" | |
| ) | |
| if check: | |
| break | |
| # 2.3 This is really the last resort. | |
| # Check that sum(|gradient_{i-1}|) < sum(|gradient_{i-2}|) | |
| # = has_improved_sum_abs_grad_previous | |
| # If now sum(|gradient_{i}|) >= sum(|gradient_{i-1}|), this iteration | |
| # made things worse and we should have stopped at i-1. | |
| # TODO: study removing 2.2 and 2.3 for instead resorting to lbfgs-b. | |
| # 2.2 Deal with relative gradient differences around machine precision. | |
| tiny_grad = sum_abs_grad_old * eps | |
| abs_grad_improvement = np.abs(sum_abs_grad - sum_abs_grad_old) | |
| check = abs_grad_improvement <= tiny_grad | |
| if is_verbose: | |
| print( | |
| " check |sum(|gradient|) - sum(|gradient_old|)| <= eps * " | |
| "sum(|gradient_old|):" | |
| f" {abs_grad_improvement} <= {tiny_grad} {check}" | |
| ) | |
| if check: | |
| break | |
| # 2.3 This is really the last resort. | |
| # Check that sum(|gradient_{i-1}|) < sum(|gradient_{i-2}|) | |
| # = has_improved_sum_abs_grad_previous | |
| # If now sum(|gradient_{i}|) >= sum(|gradient_{i-1}|), this iteration | |
| # made things worse and we should have stopped at i-1. |
|
Check of final changes:
|
|
Let's wait for the CI to complete, if there is no new |
|
No |
|
Merged! Thank you a lot for bringing this solver in, @lorentzenchr! Thanks @ogrisel for your comprehensive review and analysis. |
|
+1 for binary logistic regression. Multinomial LR will probably require significantly more work and will be problematic when both n_classes and n_features are non small at the same time. |
* FEA add NewtonSolver, CholeskyNewtonSolver and QRCholeskyNewtonSolver * ENH better singular hessian special solve * CLN fix some typos found by reviewer * TST assert ConvergenceWarning is raised * MNT add BaseCholeskyNewtonSolver * WIP colinear design in GLMs * FIX _solve_singular * FIX false unpacking in * TST add tests for unpenalized GLMs * TST fix solutions of glm_dataset * ENH add SVDFallbackSolver * CLN remove SVDFallbackSolver * ENH use gradient step for singular hessians * ENH print iteration number in warnings * TST improve test_linalg_warning_with_newton_solver * CLN LinAlgWarning fron scipy.linalg * ENH more robust hessian * ENH increase maxls for lbfgs to make it more robust * ENH add hessian_warning for too many negative hessian values * CLN some warning messages * ENH add lbfgs_step * ENH use lbfgs_step for hessian_warning * TST make them pass * TST tweek rtol for lbfgs * TST add rigoros test for GLMs * TST improve test_warm_start * ENH improve lbfgs options for better convergence * CLN fix test_warm_start * TST fix assert singular values in datasets * CLN address most review comments * ENH enable more vebosity levels for lbfgs * DOC add whatsnew * CLN remove xfail and clean a bit * CLN docstring about minimum norm * More informative repr for the glm_dataset fixture cases * Forgot to run black * CLN remove unnecessary filterwarnings * CLN address review comments * Trigger [all random seeds] on the following tests: test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * CLN add comment for lbfgs ftol=64 * machine precision * CLN XXX code comment * Trigger [all random seeds] on the following tests: test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * CLN link issue and remove code snippet in comment * Trigger [all random seeds] on the following tests: test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * CLN add catch_warnings * Trigger [all random seeds] on the following tests: test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * Trigger [all random seeds] on the following tests: test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * [all random seeds] test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * Trigger with -Werror [all random seeds] test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * ENH increase maxls to 50 * [all random seeds] test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * Revert "Trigger with -Werror [all random seeds]" This reverts commit 99f4cf9. * TST add catch_warnings to filterwarnings * TST adapt tests for newton solvers * CLN cleaner gradient step with gradient_times_newton * DOC add whatsnew * ENH always use lbfgs as fallback * TST adapt rtol * TST fix test_linalg_warning_with_newton_solver * CLN address some review comments * Improve tests related to convergence warning on collinear data * overfit -> fit * Typo in comment * Apply suggestions from code review * ENH fallback_lbfgs_solve - Do not use lbfgs steps, fall back complete to lbfgs * ENH adapt rtol * Improve test_linalg_warning_with_newton_solver * Better comments * Fixed Hessian casing and improved warning messages * [all random seeds] test_linalg_warning_with_newton_solver * Ignore ConvergenceWarnings for now if convergence is good * CLN remove counting of warnings * ENH fall back to lbfgs if line search did not converge * DOC better comment on performance bottleneck * Update GLM related examples to use the new solver * CLN address reviewer comments * EXA improve some wordings * CLN do not pop "solver in parameter constraints * CLN fix typos * DOC fix docstring * CLN remove solver newton-qr-cholesky * DOC update PR number in whatsnew * CLN address review comments * CLN remove unnecessary catch_warnings * CLN address some review comments * DOC more precise whatsnew * CLN use init_zero_coef * CLN use and test init_zero_coef * CLN address some review comments * CLN mark NewtonSolver as private by leading underscore * CLN exact comments for inner_solve * TST add test_newton_solver_verbosity * TST extend test_newton_solver_verbosity * TST logic in test_glm_regression_unpenalized * TST use count_nonzero * CLN remove super rare line search checks * MNT move Newton solver to new file _newton_solver.py Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
* FEA add NewtonSolver, CholeskyNewtonSolver and QRCholeskyNewtonSolver * ENH better singular hessian special solve * CLN fix some typos found by reviewer * TST assert ConvergenceWarning is raised * MNT add BaseCholeskyNewtonSolver * WIP colinear design in GLMs * FIX _solve_singular * FIX false unpacking in * TST add tests for unpenalized GLMs * TST fix solutions of glm_dataset * ENH add SVDFallbackSolver * CLN remove SVDFallbackSolver * ENH use gradient step for singular hessians * ENH print iteration number in warnings * TST improve test_linalg_warning_with_newton_solver * CLN LinAlgWarning fron scipy.linalg * ENH more robust hessian * ENH increase maxls for lbfgs to make it more robust * ENH add hessian_warning for too many negative hessian values * CLN some warning messages * ENH add lbfgs_step * ENH use lbfgs_step for hessian_warning * TST make them pass * TST tweek rtol for lbfgs * TST add rigoros test for GLMs * TST improve test_warm_start * ENH improve lbfgs options for better convergence * CLN fix test_warm_start * TST fix assert singular values in datasets * CLN address most review comments * ENH enable more vebosity levels for lbfgs * DOC add whatsnew * CLN remove xfail and clean a bit * CLN docstring about minimum norm * More informative repr for the glm_dataset fixture cases * Forgot to run black * CLN remove unnecessary filterwarnings * CLN address review comments * Trigger [all random seeds] on the following tests: test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * CLN add comment for lbfgs ftol=64 * machine precision * CLN XXX code comment * Trigger [all random seeds] on the following tests: test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * CLN link issue and remove code snippet in comment * Trigger [all random seeds] on the following tests: test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * CLN add catch_warnings * Trigger [all random seeds] on the following tests: test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * Trigger [all random seeds] on the following tests: test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * [all random seeds] test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * Trigger with -Werror [all random seeds] test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * ENH increase maxls to 50 * [all random seeds] test_glm_regression test_glm_regression_hstacked_X test_glm_regression_vstacked_X test_glm_regression_unpenalized test_glm_regression_unpenalized_hstacked_X test_glm_regression_unpenalized_vstacked_X test_warm_start * Revert "Trigger with -Werror [all random seeds]" This reverts commit 99f4cf9. * TST add catch_warnings to filterwarnings * TST adapt tests for newton solvers * CLN cleaner gradient step with gradient_times_newton * DOC add whatsnew * ENH always use lbfgs as fallback * TST adapt rtol * TST fix test_linalg_warning_with_newton_solver * CLN address some review comments * Improve tests related to convergence warning on collinear data * overfit -> fit * Typo in comment * Apply suggestions from code review * ENH fallback_lbfgs_solve - Do not use lbfgs steps, fall back complete to lbfgs * ENH adapt rtol * Improve test_linalg_warning_with_newton_solver * Better comments * Fixed Hessian casing and improved warning messages * [all random seeds] test_linalg_warning_with_newton_solver * Ignore ConvergenceWarnings for now if convergence is good * CLN remove counting of warnings * ENH fall back to lbfgs if line search did not converge * DOC better comment on performance bottleneck * Update GLM related examples to use the new solver * CLN address reviewer comments * EXA improve some wordings * CLN do not pop "solver in parameter constraints * CLN fix typos * DOC fix docstring * CLN remove solver newton-qr-cholesky * DOC update PR number in whatsnew * CLN address review comments * CLN remove unnecessary catch_warnings * CLN address some review comments * DOC more precise whatsnew * CLN use init_zero_coef * CLN use and test init_zero_coef * CLN address some review comments * CLN mark NewtonSolver as private by leading underscore * CLN exact comments for inner_solve * TST add test_newton_solver_verbosity * TST extend test_newton_solver_verbosity * TST logic in test_glm_regression_unpenalized * TST use count_nonzero * CLN remove super rare line search checks * MNT move Newton solver to new file _newton_solver.py Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>



Reference Issues/PRs
Fixes #16634.
What does this implement/fix? Explain your changes.
This PR adds one Newton solver where the Newton step is obtained by a Cholesky decomposition. This solver is a good choice for
n_samples >> n_featuresin well-behaved setting (positive definite hessian, usually attained with at least a small L2 penalty).This is basically the same as iterated reweighted least squares (IRLS) with inner Cholesky based solver on the normal equations.
Any other comments?
PoissonRegressor,GammaRegressorandTweedieRegressorhave the new solver. It would be very easy to extend it toLogisticRegressionas well.