[MRG+1] Change CV defaults to 5#11557
Conversation
|
I see this is still WIP but it could be worth mentioning that you will have to decorate the tests which use the cv using the |
GaelVaroquaux
left a comment
There was a problem hiding this comment.
Looks great so far. A couple minor comments.
doc/modules/cross_validation.rst
Outdated
| >>> from sklearn.model_selection import StratifiedKFold | ||
|
|
||
| >>> X = np.ones(10) | ||
| >>> X = np.ones(10) |
doc/whats_new/v0.20.rst
Outdated
|
|
||
| - The default number of cross-validation folds ``cv`` and the default number of | ||
| splits ``n_splits`` in the :class:`model_selection.KFold`-like splitters will change | ||
| from 3 to 5 in 0.22 to account for good practice in the community. |
There was a problem hiding this comment.
"to account for good practice in the community." => "as 3-fold has a lot of variance".
sklearn/model_selection/_split.py
Outdated
|
|
||
| NSPLIT_WARNING = ( | ||
| "You should specify a value for 'n_splits' instead of relying on the " | ||
| "default value. Note that this default value of 3 is deprecated in " |
There was a problem hiding this comment.
Instead of "Note...", I would say "This default value will change from 3 to 5 in version 0.22."
|
I canceled the travis build as @aboucaud is pushing a new version soon. |
sklearn/model_selection/_split.py
Outdated
| """ | ||
|
|
||
| def __init__(self, n_splits=3, shuffle=False, | ||
| def __init__(self, n_splits=None, shuffle=False, |
There was a problem hiding this comment.
I thought we're gonna use 'warn' from now on?
There was a problem hiding this comment.
You want to replace all None by "warn"? Fine with me.
There was a problem hiding this comment.
@amueller for n_splits only or cv as well ?
|
looks good apart from None as sentinel vs 'warn'. |
|
lgtm |
|
We'll merge when travis is ready. |
|
test errors :-/ |
| splits ``n_splits`` in the :class:`model_selection.KFold`-like splitters will change | ||
| from 3 to 5 in 0.22 as 3-fold has a lot of variance. | ||
| :issue:`11129` by :user:`Alexandre Boucaud <aboucaud>`. | ||
|
|
There was a problem hiding this comment.
should be the number of the PR not the issue, right ?
There was a problem hiding this comment.
dunno, you tell me sprint master.
|
Off to bed, will finish this tomorrow. Most of the work should be behind now. |
|
Green ✌️ ! I still did not address properly @amueller comment since I only added
The difficulty is to separate the cases in which it was about I could try to unify the warning messages (below) to catch a better part of the message in NSPLIT_WARNING = (
"You should specify a value for 'n_splits' instead of relying on the "
"default value. The default value will change from 3 to 5 "
"in version 0.22.")
CV_WARNING = (
"You should specify a value for 'cv' instead of relying on the "
"default value. The default value will change from 3 to 5 "
"in version 0.22.")WDYT ? @GaelVaroquaux @amueller |
|
Is skipping all the doctests the right way to make travis green ? |
|
When I merged I agree it is probably not good thing. Many of this failing tests used the default value for I just end up having so many modifications in this PR that cannot be properly checked or reviewed that I would be in favor of having a following PR address the doctests, using |
sklearn/feature_selection/rfe.py
Outdated
| Refer :ref:`User Guide <cross_validation>` for the various | ||
| cross-validation strategies that can be used here. | ||
|
|
||
| .. deprecated:: 0.20 |
There was a problem hiding this comment.
Can you make this versionchanged instead of deprecated? (because the keyword itself is not deprecated)
There was a problem hiding this comment.
done.
we should add a line in the contributing.rst then to specify that.
Guillaume: if you set it to 5 manually in the doc examples, is it then still needed to skip? |
|
I just end up having so many modifications in this PR that cannot be properly
checked or reviewed that I would be in favor of having a following PR address
the doctests, using # doctests: + SKIP as an anchor.
No, this is really bad. We need to have tests passing, not skipped. This
is our guarantee of the quality of the codebase.
|
|
Ok, I'm on it |
|
@GaelVaroquaux can you interrupt the build on the first commit to let the last one build |
|
It restarts automatically each time you push |
doc/modules/model_evaluation.rst
Outdated
| >>> X, y = iris.data, iris.target | ||
| >>> clf = svm.SVC(gamma='scale', random_state=0) | ||
| >>> cross_val_score(clf, X, y, scoring='recall_macro') # doctest: +ELLIPSIS | ||
| >>> cross_val_score(clf, X, y, scoring='recall_macro') # doctest: +SKIP |
doc/modules/model_evaluation.rst
Outdated
| >>> from sklearn.svm import LinearSVC | ||
| >>> grid = GridSearchCV(LinearSVC(), param_grid={'C': [1, 10]}, scoring=ftwo_scorer) | ||
| >>> grid = GridSearchCV(LinearSVC(), param_grid={'C': [1, 10]}, | ||
| ... scoring=ftwo_scorer) # doctest: +SKIP |
doc/modules/model_evaluation.rst
Outdated
| ... scoring=scoring) # doctest: +SKIP | ||
| >>> # Getting the test set true positive scores | ||
| >>> print(cv_results['test_tp']) # doctest: +NORMALIZE_WHITESPACE | ||
| >>> print(cv_results['test_tp']) # doctest: +SKIP |
GaelVaroquaux
left a comment
There was a problem hiding this comment.
I saw a few change in the doctest pragma that didn't look right.
Aside from that, +1 for merge.
doc/modules/ensemble.rst
Outdated
| >>> scores = cross_val_score(clf, iris.data, iris.target) | ||
| >>> scores.mean() # doctest: +ELLIPSIS | ||
| >>> scores = cross_val_score(clf, iris.data, iris.target, cv=5) | ||
| >>> scores.mean() |
There was a problem hiding this comment.
I am very surprised by the fact that " doctest: +ELLIPSIS was removed.
doc/modules/learning_curve.rst
Outdated
| array([[0.93..., 0.94..., 0.92..., 0.91..., 0.92...], | ||
| [0.93..., 0.94..., 0.92..., 0.91..., 0.92...], | ||
| [0.51..., 0.52..., 0.49..., 0.47..., 0.49...]]) | ||
| >>> valid_scores # doctest: +ELLIPSIS |
There was a problem hiding this comment.
Here, I think that keeping "+NORMALIZE_WHITESPACE" would be a good idea.
|
Why did you remove many # doctest : NORMALIZE_WHITESPACE and ELLIPSIS ? |
|
@GaelVaroquaux alex made the requested changes. I think it's good to go now. |
|
LGTM. Merging |
|
Ohhh yeahhh! |
Reference Issues/PRs
Fixes #11129 and takes over stalled PR #11139
What does this implement/fix? Explain your changes.
Add warning for models that do not specify an explicit value for
cvorn_splitsto prepare for a future deprecation of the default to 3 and an update of the default value to 5.