[MRG+1] ENH add multi_class='auto' for LogisticRegression, default from 0.22; default solver will be 'lbfgs'#11905
Conversation
This reverts commit d06bad1.
rth
left a comment
There was a problem hiding this comment.
LGTM. I think the multi_class= 'auto' is indeed better than the earlier alternatives.
+1 for merging and fixing the remaining points mentioned in the description after the RC.
@amueller or @agramfort would you be able to have a look?
doc/whats_new/v0.20.rst
Outdated
| :class:`linear_model.LogisticRegression` will change respectively from | ||
| ``'liblinear'`` and ``'ovr'`` in version 0.20 to ``'lbfgs'`` and | ||
| ``'auto'`` in version 0.22. A FutureWarning is raised when the default | ||
| values are used. :issue:`11476` by `Tom Dupre la Tour`_. |
sklearn/linear_model/logistic.py
Outdated
| Y_multi = le.fit_transform(y).astype(X.dtype, copy=False) | ||
|
|
||
| w0 = np.zeros((classes.size, n_features + int(fit_intercept)), | ||
| w0 = np.zeros((max(2, classes.size), n_features + int(fit_intercept)), |
There was a problem hiding this comment.
@amueller 's #11476 (comment) still apply here I think
There was a problem hiding this comment.
It seems this can be safely reverted. I think we block classes.size < 2...
sklearn/linear_model/logistic.py
Outdated
|
|
||
| if multi_class == 'auto': | ||
| multi_class = ('ovr' if coefs[0].shape[0] <= 2 or solver == 'liblinear' | ||
| else 'multinomial') |
There was a problem hiding this comment.
Could we use _check_multi_class here instead of the above line, or is that redundant?
There was a problem hiding this comment.
And this can be safely removed. _log_reg_scoring_path assumes pre-validation.
|
If you've verified that you're happy with the new tests, I think we can just merge and get the release on its way. |
|
btw, I did change it to not warn upon binary, but I've not fixed up the examples, docs, tests to not pass redundantly. |
|
Oh, this didn't change any examples, so... |
| y_bin = y_multi == 0 | ||
| est_auto_bin = fit(X, y_bin, multi_class='auto', solver=solver) | ||
| est_ovr_bin = fit(X, y_bin, multi_class='ovr', solver=solver) | ||
| assert np.allclose(est_auto_bin.coef_, est_ovr_bin.coef_) |
There was a problem hiding this comment.
Not critical, but I think assert_allclose produces better trackbacks in the case of failure, also it has a mores strict default relative and absolute tolerances we might want to use one or the other but not mix both.
| solver=solver).coef_) | ||
| assert not np.allclose(est_auto_bin.coef_, | ||
| fit(X, y_multi, multi_class='multinomial', | ||
| solver=solver).coef_) |
There was a problem hiding this comment.
Overall I think these tests are fairly thorough!
|
Merging with a +2 in the parent PR (#11476) and another +1 from #11476 (comment) for the added changes. Thanks a lot @jnothman for all your work on making the RC happen! |
* tag '0.20rc1': (1109 commits) MNT rc version DOC Release dates for 0.20 (scikit-learn#11838) DOC Fix: require n_splits > 1 in TimeSeriesSplit (scikit-learn#11937) FIX xfail for MacOS LogisticRegressionCV stability (scikit-learn#11936) MNT: Use GEMV in enet_coordinate_descent (Pt. 1) (scikit-learn#11896) [MRG] TST/FIX stop optics reachability failure on 32bit (scikit-learn#11916) ENH add multi_class='auto' for LogisticRegression, default from 0.22; default solver will be 'lbfgs' (scikit-learn#11905) MAINT Fix test_logistic::test_dtype_match failure on 32 bit arch (scikit-learn#11899) DOC Updated link to Laurens van der Maaten's home page (scikit-learn#11907) DOC Remove stray backtick in /doc/modules/feature_extraction.rst (scikit-learn#11910) Deprecate min_samples_leaf and min_weight_fraction_leaf (scikit-learn#11870) MNT modify test_sparse_oneclasssvm to be parametrized (scikit-learn#11894) EXA set figure size to avoid overlaps (scikit-learn#11889) MRG/REL fixes /skips for 32bit tests (scikit-learn#11879) add durations=20 to makefile to show test runtimes locally (scikit-learn#11147) DOC loss='l2' is no longer accpeted in l1_min_c DOC add note about brute force nearest neighbors for string data (scikit-learn#11884) DOC Change sign of energy in RBM (scikit-learn#11156) RFC try to warn on iid less often (scikit-learn#11613) DOC reduce plot_gpr_prior_posterior.py warnings(scikit-learn#11664) ...
* releases: (1109 commits) MNT rc version DOC Release dates for 0.20 (scikit-learn#11838) DOC Fix: require n_splits > 1 in TimeSeriesSplit (scikit-learn#11937) FIX xfail for MacOS LogisticRegressionCV stability (scikit-learn#11936) MNT: Use GEMV in enet_coordinate_descent (Pt. 1) (scikit-learn#11896) [MRG] TST/FIX stop optics reachability failure on 32bit (scikit-learn#11916) ENH add multi_class='auto' for LogisticRegression, default from 0.22; default solver will be 'lbfgs' (scikit-learn#11905) MAINT Fix test_logistic::test_dtype_match failure on 32 bit arch (scikit-learn#11899) DOC Updated link to Laurens van der Maaten's home page (scikit-learn#11907) DOC Remove stray backtick in /doc/modules/feature_extraction.rst (scikit-learn#11910) Deprecate min_samples_leaf and min_weight_fraction_leaf (scikit-learn#11870) MNT modify test_sparse_oneclasssvm to be parametrized (scikit-learn#11894) EXA set figure size to avoid overlaps (scikit-learn#11889) MRG/REL fixes /skips for 32bit tests (scikit-learn#11879) add durations=20 to makefile to show test runtimes locally (scikit-learn#11147) DOC loss='l2' is no longer accpeted in l1_min_c DOC add note about brute force nearest neighbors for string data (scikit-learn#11884) DOC Change sign of energy in RBM (scikit-learn#11156) RFC try to warn on iid less often (scikit-learn#11613) DOC reduce plot_gpr_prior_posterior.py warnings(scikit-learn#11664) ...
* dfsg: (1109 commits) MNT rc version DOC Release dates for 0.20 (scikit-learn#11838) DOC Fix: require n_splits > 1 in TimeSeriesSplit (scikit-learn#11937) FIX xfail for MacOS LogisticRegressionCV stability (scikit-learn#11936) MNT: Use GEMV in enet_coordinate_descent (Pt. 1) (scikit-learn#11896) [MRG] TST/FIX stop optics reachability failure on 32bit (scikit-learn#11916) ENH add multi_class='auto' for LogisticRegression, default from 0.22; default solver will be 'lbfgs' (scikit-learn#11905) MAINT Fix test_logistic::test_dtype_match failure on 32 bit arch (scikit-learn#11899) DOC Updated link to Laurens van der Maaten's home page (scikit-learn#11907) DOC Remove stray backtick in /doc/modules/feature_extraction.rst (scikit-learn#11910) Deprecate min_samples_leaf and min_weight_fraction_leaf (scikit-learn#11870) MNT modify test_sparse_oneclasssvm to be parametrized (scikit-learn#11894) EXA set figure size to avoid overlaps (scikit-learn#11889) MRG/REL fixes /skips for 32bit tests (scikit-learn#11879) add durations=20 to makefile to show test runtimes locally (scikit-learn#11147) DOC loss='l2' is no longer accpeted in l1_min_c DOC add note about brute force nearest neighbors for string data (scikit-learn#11884) DOC Change sign of energy in RBM (scikit-learn#11156) RFC try to warn on iid less often (scikit-learn#11613) DOC reduce plot_gpr_prior_posterior.py warnings(scikit-learn#11664) ...
Adds multi_class='auto'
'auto' will be default from 0.22. Also sets solver='lbfgs' to default from 0.22.
This is okay to merge for RC, but I've not yet done a couple of things we should do for the final version:
multi_classin docs/examples/tests when warning will therefore be silenced.I may not have time to finish this for a couple of days.
Supersedes and closes #11476
Supersedes and closes #10001
Closes #9997