[MRG+1] GridSearchCV iid#9379
Conversation
…model_selection module
|
this makes cross_val_scores(..).mean() equivalent to gridsearchcv mean cv scores identical? |
|
yes |
|
well in version 0.21 by default |
… test set sized different.
|
I think I misunderstood the iid parameter (again). It only reweights the mean computation but does not change the scoring. so it will warn whenever the test set sizes are unequal now. |
doc/whats_new.rst
Outdated
| - The ``iid`` parameter of :class:`model_selection.GridSearchCV` and | ||
| :class:`model_selection.RandomizedSearchCV` has been deprecated and will | ||
| be removed in version 0.21. Future behavior will be the current default | ||
| behavior (equivalent to ``iid=True``). |
There was a problem hiding this comment.
I'd also appreciate a note just explaining that weighting the average in CV is not appropriate (and we don't do it in cross_val_score either).
|
Resolve conflicts, change version numbers. LGTM. It's okay if this happens slowly. Let's just make it happen. Another review? |
sklearn/model_selection/_search.py
Outdated
| ..deprecated:: 0.19 | ||
| Parameter ``iid`` has been deprecated in version 0.19 and | ||
| will be removed in 0.21. | ||
| Future (and default) behavior is equivalent to `iid=true`. |
There was a problem hiding this comment.
these version numbers are not consistent with what's new
|
besides LGTM @amueller you need to rebase |
|
Thanks for the review @agramfort, will fix it next week, afk camping. |
|
should be good now. |
doc/whats_new/v0.20.rst
Outdated
|
|
||
| - The default of the ``iid`` parameter of :class:`model_selection.GridSearchCV` and | ||
| :class:`model_selection.RandomizedSearchCV` will change from ``True`` to ``False`` | ||
| in version 0.22, and will be removed in version 0.24. |
There was a problem hiding this comment.
"the parameter will be removed altogether"
sklearn/model_selection/_search.py
Outdated
| the folds, and the loss minimized is the total loss per sample, | ||
| and not the mean loss across the folds. | ||
| and not the mean loss across the folds. Default is True, | ||
| but will change to False in version 0.21. |
There was a problem hiding this comment.
I think (particularly given that we tend to get complaints about deprecations) that it's worth adding a few words on why, or basically saying "We now consider the iid=True formulation to be optimising an incorrect cross validation objective."
sklearn/model_selection/_search.py
Outdated
| iid : boolean, default=None | ||
| If True, the data is assumed to be identically distributed across | ||
| the folds, and the loss minimized is the total loss per sample, | ||
| and not the mean loss across the folds. |
There was a problem hiding this comment.
Please make identical to above.
sklearn/model_selection/_search.py
Outdated
| as in '2*n_jobs' | ||
|
|
||
| iid : boolean, default=True | ||
| iid : boolean, default=None |
There was a problem hiding this comment.
hm... that's not entirely clear, is it? though I don't really have a better idea. Usually we use None for deprecated parameters.
There was a problem hiding this comment.
Generally, I prefer describing the defaults than stating them here anyway, particularly when they are something semantically underspecified like "None". But seeing default='warn' or default='deprecated' in the signature is quite user-friendly, IMO. None is not.
qinhanmin2014
left a comment
There was a problem hiding this comment.
LGTM. I've fixed the conflict. Already +3 so merge? @jnothman @amueller @agramfort
|
Seems that we now need to modify the test (introduced in #9677) to make CIs green. |
|
This pull request introduces 1 alert - view on lgtm.com new alerts:
Comment posted by lgtm.com |
|
I've resolved the conflict, the test error and the warning from lgtm. @amueller hope you won't mind. |
sklearn/model_selection/_search.py
Outdated
| iid : boolean, default=True | ||
| iid : boolean, default='warn' | ||
| If True, the data is assumed to be identically distributed across | ||
| the folds, and the loss minimized is the total loss per sample, |
There was a problem hiding this comment.
This has always been a weird description. Could we clarify that it is an average score across folds, weighted by the number of samples in each test set...?
doc/whats_new/v0.20.rst
Outdated
| - The default of the ``iid`` parameter of :class:`model_selection.GridSearchCV` and | ||
| :class:`model_selection.RandomizedSearchCV` will change from ``True`` to ``False`` | ||
| in version 0.22, and the parameter will be removed in version 0.24 altogether. | ||
| :issue:`9085` by :user:`Laurent Direr <ldirer>` and `Andreas Müller`_. |
There was a problem hiding this comment.
Add a note that: This parameter is of greatest practical significance where the sizes of different test sets in cross-validation were very unequal, i.e. in group-based CV strategies.
|
ping @jnothman I update the document and what's new accordingly. Hope that I don't make anyone unhappy :) |
|
I still don't really understand why this weighted average is invalid. But
the name iid is certainly unhelpful. And it makes the code harder to maintain and harder to compare with cross_val_score(...).mean().
|
|
Seems that I mistakenly suppose there's already consensus since it's marked as blocker and is already +2 before l came. |
|
Yes, I think we should merge this. But I think there had been some confusion, where some thought that iid=True was doing something like |
|
@jnothman I might think the doc is enough, we now have a warning if users do not set iid |
|
@qinhanmin2014 thanks for finishing up. I guess the main issue with this was that it will add a lot of warnings. How many warnings do we get on master now? |
|
(answer: a bunch :-/) |
|
Oh, good point.
|
|
No saying it was a bad idea to merge, but that had been a concern (haven't been following) |
|
Just stumble into that in the SciPy tutorial. I get warning and then you will be tempted to turn |
That's why the deprecation cycle is extra long. You can either ignore the warning it with the warnings module, or by setting |
True. I think that I will ignore this warning specifically. |
tl;dr `iid` has been deprecated and the updated behavior corresponds to `iid=True`. References: - Deprecation warning and explanation of associated behavior: scikit-learn/scikit-learn#9379. - Deprecation: scikit-learn/scikit-learn#13834.
Continuation of #9103. Fixes #9085.
Closes #9103