Skip to content

Fixes #11129: Change cv default to 5#11139

Closed
shaz13 wants to merge 15 commits intoscikit-learn:masterfrom
shaz13:kfold-cv-default-5
Closed

Fixes #11129: Change cv default to 5#11139
shaz13 wants to merge 15 commits intoscikit-learn:masterfrom
shaz13:kfold-cv-default-5

Conversation

@shaz13
Copy link
Copy Markdown
Contributor

@shaz13 shaz13 commented May 25, 2018

Fixes #11129

@shaz13
Copy link
Copy Markdown
Contributor Author

shaz13 commented May 25, 2018

Upcoming commits with deprecation warning in progress.

@amueller
Copy link
Copy Markdown
Member

If None, the random number generator is the RandomState instance used
by `np.random`. Used when ``shuffle`` == True.

.. deprecated:: 0.20
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be "versionchanged" I think? Maybe? The parameter is not deprecated, only the default value changed.

@amueller
Copy link
Copy Markdown
Member

thanks. Can you please check the travis failure?
Also, you should make sure that our test suite actually doesn't throw these warnings anywhere. Either we should catch them, or we can specify the value explicitly to avoid the warning.

Modified the code.  Added suitable code for test coverage
@shaz13
Copy link
Copy Markdown
Contributor Author

shaz13 commented May 27, 2018

Update: Few more tests left to cover. Commit will be pushed by tomorrow. Thanks!

Copy link
Copy Markdown
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than catching the warnings, we should make sure to pass cv explicitly in all docs and tests

by `np.random`. Used when ``shuffle`` == True.

.. versionchanged:: 0.20
The default value ``n_splits=3`` is deprecated in version 0.20 and will
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be indented

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed this in hurry. Done. Thanks! 👍

if self.shuffle:
check_random_state(self.random_state).shuffle(indices)

if self.n_splits == 3:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the wrong place to warn. It should happen in check_cv. It should also not warn if the user explicitly sets cv to 3

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it to check_cv. The tests passed locally.

@shaz13
Copy link
Copy Markdown
Contributor Author

shaz13 commented May 29, 2018

@amueller @jnothman All checks have passed. Please take a look!

@qinhanmin2014
Copy link
Copy Markdown
Member

Should this be a FutureWarning instead of DeprecationWarning? We're not deprecating anything and I recall there's similar discussion elsewhere (e.g., #10331 (comment))

warnings.warn("The default value of n_splits=3 is deprecated"
" in version 0.20 and will be changed to "
"n_splits=5 in version 0.22", DeprecationWarning)
if cv is None:
Copy link
Copy Markdown
Contributor

@urvang96 urvang96 May 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the warning should only be displayed if cv is None, as cv will take the default value in that case.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks!

splits via the ``split`` method.
"""
if cv is None:
warnings.warn("The default value of n_splits=3 is deprecated"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value of cv on line 1866 should be cv=None, otherwise we will never enter this block.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe pass an explicit value for cv?

param_range=param_range, cv=2
)
if len(w) > 0:
if len(w) > 1:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is too loose. You could test != 1, with a comment on why you should expect 1 warning, or, better, you can remove the expected warning from w

lars_cv = linear_model.LassoLarsCV(max_iter=5)
lars_cv.fit(X, y)
assert_true(len(w) == 0)
assert_true(len(w) >= 0)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. See below

result[val] = assert_no_warnings(cross_validate, estimator, X, y,
return_train_score=val)

result[val] = assert_warns_message(FutureWarning, msg_nsplit,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly this needs to assert that no other warnings are issued. Secondly it needs to be clear to the developer tasked with completing this deprecation in a couple of years' time what it used to be asserting, i.e. that it was asserting an absence of warnings.t

Thirdly, the right solution is actually to pass an explicit value for cv here. Sorry I didn't notice that before.

Copy link
Copy Markdown
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @shaz13 are you still working on it?


Model evaluation and meta-estimators

- The default of ``n_splits`` parameter of :class:`model_selection.KFold` is
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a API changes summary section for it. Also, I think we need a reason for the API change. See e.g.,

The default value of ``gamma`` parameter of :class:`svm.SVC`,
:class:`~svm.NuSVC`, :class:`~svm.SVR`, :class:`~svm.NuSVR`,
:class:`~svm.OneClassSVM` will change from ``'auto'`` to ``'scale'`` in
version 0.22 to account better for unscaled features. :issue:`8361` by
:user:`Gaurav Dhingra <gxyd>` and :user:`Ting Neo <neokt>`.

splits via the ``split`` method.
"""
if cv is None:
warnings.warn("The default value of n_splits=3 is deprecated"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe pass an explicit value for cv?

RandomizedSearchCV(LinearSVC(random_state=0), grid,
n_iter=2, iid=False)]

msg_nsplit = ("The default value of n_splits=3 is deprecated in "
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the message here. It would be better if you can refer to merged PRs so that users can get similar messages for similar issues.

fit_func = ignore_warnings(estimator.fit,
category=ConvergenceWarning)
result[val] = assert_no_warnings(fit_func, X, y).cv_results_
result[val] = assert_warns_message(FutureWarning, msg_nsplit,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe pass an explicit value for cv?

grid = GridSearchCV(SVC(gamma='scale'), param_grid={'C': [1]}, cv=3)
# no warning with equally sized test sets
assert_no_warnings(grid.fit, X, y)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated change

@shaz13
Copy link
Copy Markdown
Contributor Author

shaz13 commented Jul 9, 2018

ping @qinhanmin2014. Unfortunately, I cannot contribute to opensource source code until I am in a contract with my employer. Please feel free to reassign the issue to another contributor. Thanks!

@qinhanmin2014
Copy link
Copy Markdown
Member

Thanks @shaz13 for your great work so far :)

@wenhaoz-fengcai
Copy link
Copy Markdown

@qinhanmin2014 Should I open up a new PR if I want to take over this fix?

@qinhanmin2014
Copy link
Copy Markdown
Member

@jiaowoshabi You should open another PR :)
Contributions are always welcomed, but I just want to clarify that the issue is difficult and current PR only solves a small part of it.
See #11129 (comment)

@aboucaud
Copy link
Copy Markdown
Contributor

@jiaowoshabi are you already working on it ? Otherwise I'd like to take over this PR for the current sprint.

@wenhaoz-fengcai
Copy link
Copy Markdown

wenhaoz-fengcai commented Jul 16, 2018

@aboucaud I'm not working on this PR. Go ahead on your fix

@aboucaud
Copy link
Copy Markdown
Contributor

@amueller @jnothman #11557 has been merged so this PR can be closed now.

@shaz13 shaz13 deleted the kfold-cv-default-5 branch September 16, 2018 11:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants