[MRG+1] Changes default for return_train_score to False#9677
[MRG+1] Changes default for return_train_score to False#9677jnothman merged 73 commits intoscikit-learn:masterfrom
Conversation
|
@jnothman Kindly suggest a suitable warning message. |
| is_multimetric) | ||
| score_train_time = time.time() - start_time - fit_time - score_time | ||
| if (score_train_time >= 0.1*fit_time and | ||
| time.time() - start_time) > 5: |
There was a problem hiding this comment.
Parentheses in a strange place here...
And can't compare absolute time to 5 in a single fit. It will often only be a substantial amount of time over a large number of fits, as in the raising issue.
There was a problem hiding this comment.
How about adding a new variable alongside return_train_score in BaseSearchCV, containing the time of the first call to GridSearchCV. Also we need to pass that time variable to the _fit_and_score function.
|
@jnothman Added a parameter in the function which takes care of the actual start time of GridSearchCV. Please see if this is valid or not. |
| train_scores = _score(estimator, X_train, y_train, scorer, | ||
| is_multimetric) | ||
| score_train_time = time.time() - start_time - fit_time - score_time | ||
| if score_train_time >= 0.1*fit_time and time.time() - grid_search_start_time> 5: |
There was a problem hiding this comment.
This still may mean that training score calculation time took only .5 of a second, which I don't think is quite enough. Maybe better would be to (naively) estimate overall training score time: (time.time() - grid_search_start_time) * score_train_time / (fit_time + score_time + score_train_time) > 5..?
|
The only other way to do it without changing the _fit_and_score interface
is to only raise the warning after fitting all estimators. And if we're
going to change _fit_and_score's interface, this is a good solution.
…On 3 September 2017 at 21:21, Kumar Ashutosh ***@***.***> wrote:
@jnothman <https://github.com/jnothman> I also agree upon the fact that
it makes the _fit_and_score method more messy. But I could not find and
alternative to keep track of total time that GridSearchCV would take. You
or @amueller <https://github.com/amueller> may suggest a way which serves
the purpose without changing _fit_and_score interface.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9677 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz65tS2hXIxc9O6p11nNoBA8BmsN8Kks5seoukgaJpZM4PLCkj>
.
|
|
@jnothman Should I continue with changing the function interface or raise warning only after completing |
|
I hope we'll get someone else's opinion soon... I suspect we'll land up
with warning at the end, but I'm not sure.
… |
|
What about the seemingly simpler solution of changing the default to |
|
@lesteve Yes, this is also a simple solution, will be done once others agree upon this. |
|
I suppose that's an acceptable solution.
…On 7 September 2017 at 21:47, Kumar Ashutosh ***@***.***> wrote:
@lesteve <https://github.com/lesteve> Yes, this is also a simple
solution, will be done once others agree upon this.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9677 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz63eCrLSWgNrp09pyhuLx8FrI2W7Nks5sf9fKgaJpZM4PLCkj>
.
|
|
@jnothman Kindly review and suggest a suitable FutureWarning message. |
|
Suggestions for a "nicer" warning message. Otherwise, I think this will work. :) |
|
Even after doing the required changes (change in test still not made), the following lines of code does not produce a deprecation warning, is it how DeprecationDict is supposed to behave or I am missing out something. |
|
Warning should be produced when accessing the from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV
iris = datasets.load_iris()
parameters = {'kernel':('linear','rbf'), 'C':[1,10]}
svc = svm.SVC()
clf = GridSearchCV(svc, parameters)
clf.fit(iris.data, iris.target)
clf.cv_results_['split0_train_score'] # this is the line that should produce a warningNote that you should not get any warning if you use from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV
iris = datasets.load_iris()
parameters = {'kernel':('linear','rbf'), 'C':[1,10]}
svc = svm.SVC()
clf = GridSearchCV(svc, parameters, return_train_score=True)
clf.fit(iris.data, iris.target)
clf.cv_results_['split0_train_score'] # no warning because return_train_score is set to TrueBasically what we want users who do not set |
|
I am not sure how DeprecationDict is working. Need help on how to implement. |
|
You need to help us help you! Can you be specific about your problem is? what have you tried? If there is a failure that you don't understand, can you copy and paste it here? |
|
See my PR at thechargedneutron#2 |
|
I restarted the failing build in Travis hoping the timeout was just a glitch. |
|
Do you think this approach is decent, @lesteve? Too messy? It implies that a user no longer gets a warning informing them why the |
|
I have not looked at the diff of this PR (will do shortly). To be perfectly honest, I think your DeprecationDict suggestions is really neat and this is the best we can do:
|
sklearn/model_selection/_search.py
Outdated
| "validation significantly. This is the reason " | ||
| "return_train_score will change its default value " | ||
| "from True (current behaviour) to False in 0.21. " | ||
| "Please set return_train_score explicitly to get " |
There was a problem hiding this comment.
I think this message should be changed slightly to say something along the lines of "Looks like you are using training scores, you want to set return_train_score=True because return_train_score default will change in 0.21"
Make docstrings more uniform
|
@jnothman I pushed two main changes, it would be nice to have your opinion on these:
I am going to reset the LGTM count and add a +1 from me. |
| 'which will not be available by default ' | ||
| 'any more in 0.21. If you need training scores, ' | ||
| 'please set return_train_score=True').format(key) | ||
| train_score = assert_warns_message(FutureWarning, msg, |
There was a problem hiding this comment.
shouldn't we assert that there is no warning for the other vals and that there is no warning for the other keys for 'warn'?
Otherwise LGTM.
|
Thanks all |
Reference Issue
Fixes #9621
What does this implement/fix? Explain your changes.
Changes the default value of
return_train_scoretowarn. Raises a warning if train score takes more time.Any other comments?
The warning message needs improvement.