Fixes #11129: Change cv default to 5#11139
Fixes #11129: Change cv default to 5#11139shaz13 wants to merge 15 commits intoscikit-learn:masterfrom
Conversation
|
Upcoming commits with deprecation warning in progress. |
|
Please check out http://scikit-learn.org/dev/developers/contributing.html#deprecation |
sklearn/model_selection/_split.py
Outdated
| If None, the random number generator is the RandomState instance used | ||
| by `np.random`. Used when ``shuffle`` == True. | ||
|
|
||
| .. deprecated:: 0.20 |
There was a problem hiding this comment.
this should be "versionchanged" I think? Maybe? The parameter is not deprecated, only the default value changed.
|
thanks. Can you please check the travis failure? |
Modified the code. Added suitable code for test coverage
|
Update: Few more tests left to cover. Commit will be pushed by tomorrow. Thanks! |
jnothman
left a comment
There was a problem hiding this comment.
Rather than catching the warnings, we should make sure to pass cv explicitly in all docs and tests
sklearn/model_selection/_split.py
Outdated
| by `np.random`. Used when ``shuffle`` == True. | ||
|
|
||
| .. versionchanged:: 0.20 | ||
| The default value ``n_splits=3`` is deprecated in version 0.20 and will |
There was a problem hiding this comment.
Missed this in hurry. Done. Thanks! 👍
sklearn/model_selection/_split.py
Outdated
| if self.shuffle: | ||
| check_random_state(self.random_state).shuffle(indices) | ||
|
|
||
| if self.n_splits == 3: |
There was a problem hiding this comment.
This is the wrong place to warn. It should happen in check_cv. It should also not warn if the user explicitly sets cv to 3
There was a problem hiding this comment.
Moved it to check_cv. The tests passed locally.
|
Should this be a |
| warnings.warn("The default value of n_splits=3 is deprecated" | ||
| " in version 0.20 and will be changed to " | ||
| "n_splits=5 in version 0.22", DeprecationWarning) | ||
| if cv is None: |
There was a problem hiding this comment.
I guess the warning should only be displayed if cv is None, as cv will take the default value in that case.
Replaced Deprecation Warnings with Future Warnings
| splits via the ``split`` method. | ||
| """ | ||
| if cv is None: | ||
| warnings.warn("The default value of n_splits=3 is deprecated" |
There was a problem hiding this comment.
The default value of cv on line 1866 should be cv=None, otherwise we will never enter this block.
There was a problem hiding this comment.
maybe pass an explicit value for cv?
| param_range=param_range, cv=2 | ||
| ) | ||
| if len(w) > 0: | ||
| if len(w) > 1: |
There was a problem hiding this comment.
I think this is too loose. You could test != 1, with a comment on why you should expect 1 warning, or, better, you can remove the expected warning from w
| lars_cv = linear_model.LassoLarsCV(max_iter=5) | ||
| lars_cv.fit(X, y) | ||
| assert_true(len(w) == 0) | ||
| assert_true(len(w) >= 0) |
| result[val] = assert_no_warnings(cross_validate, estimator, X, y, | ||
| return_train_score=val) | ||
|
|
||
| result[val] = assert_warns_message(FutureWarning, msg_nsplit, |
There was a problem hiding this comment.
Firstly this needs to assert that no other warnings are issued. Secondly it needs to be clear to the developer tasked with completing this deprecation in a couple of years' time what it used to be asserting, i.e. that it was asserting an absence of warnings.t
Thirdly, the right solution is actually to pass an explicit value for cv here. Sorry I didn't notice that before.
qinhanmin2014
left a comment
There was a problem hiding this comment.
ping @shaz13 are you still working on it?
|
|
||
| Model evaluation and meta-estimators | ||
|
|
||
| - The default of ``n_splits`` parameter of :class:`model_selection.KFold` is |
There was a problem hiding this comment.
We have a API changes summary section for it. Also, I think we need a reason for the API change. See e.g.,
The default value of ``gamma`` parameter of :class:`svm.SVC`,
:class:`~svm.NuSVC`, :class:`~svm.SVR`, :class:`~svm.NuSVR`,
:class:`~svm.OneClassSVM` will change from ``'auto'`` to ``'scale'`` in
version 0.22 to account better for unscaled features. :issue:`8361` by
:user:`Gaurav Dhingra <gxyd>` and :user:`Ting Neo <neokt>`.
| splits via the ``split`` method. | ||
| """ | ||
| if cv is None: | ||
| warnings.warn("The default value of n_splits=3 is deprecated" |
There was a problem hiding this comment.
maybe pass an explicit value for cv?
| RandomizedSearchCV(LinearSVC(random_state=0), grid, | ||
| n_iter=2, iid=False)] | ||
|
|
||
| msg_nsplit = ("The default value of n_splits=3 is deprecated in " |
There was a problem hiding this comment.
I don't like the message here. It would be better if you can refer to merged PRs so that users can get similar messages for similar issues.
| fit_func = ignore_warnings(estimator.fit, | ||
| category=ConvergenceWarning) | ||
| result[val] = assert_no_warnings(fit_func, X, y).cv_results_ | ||
| result[val] = assert_warns_message(FutureWarning, msg_nsplit, |
There was a problem hiding this comment.
maybe pass an explicit value for cv?
| grid = GridSearchCV(SVC(gamma='scale'), param_grid={'C': [1]}, cv=3) | ||
| # no warning with equally sized test sets | ||
| assert_no_warnings(grid.fit, X, y) | ||
|
|
|
ping @qinhanmin2014. Unfortunately, I cannot contribute to opensource source code until I am in a contract with my employer. Please feel free to reassign the issue to another contributor. Thanks! |
|
Thanks @shaz13 for your great work so far :) |
|
@qinhanmin2014 Should I open up a new PR if I want to take over this fix? |
|
@jiaowoshabi You should open another PR :) |
|
@jiaowoshabi are you already working on it ? Otherwise I'd like to take over this PR for the current sprint. |
|
@aboucaud I'm not working on this PR. Go ahead on your fix |
Fixes #11129