[MRG+1] Changes default for return_train_score to False by thechargedneutron · Pull Request #9677 · scikit-learn/scikit-learn

thechargedneutron · 2017-09-02T23:34:16Z

Reference Issue

Fixes #9621

What does this implement/fix? Explain your changes.

Changes the default value of return_train_score to warn. Raises a warning if train score takes more time.

Any other comments?

The warning message needs improvement.

…into searchcv

thechargedneutron · 2017-09-02T23:56:08Z

@jnothman Kindly suggest a suitable warning message.
Also please review. I guess this would need a lot of changes, mostly in documentations.

jnothman · 2017-09-03T03:23:03Z

sklearn/model_selection/_validation.py

                                  is_multimetric)
+            score_train_time = time.time() - start_time - fit_time - score_time
+            if (score_train_time >= 0.1*fit_time and
+               time.time() - start_time) > 5:


Parentheses in a strange place here...

And can't compare absolute time to 5 in a single fit. It will often only be a substantial amount of time over a large number of fits, as in the raising issue.

How about adding a new variable alongside return_train_score in BaseSearchCV, containing the time of the first call to GridSearchCV. Also we need to pass that time variable to the _fit_and_score function.

…into searchcv

thechargedneutron · 2017-09-03T08:12:11Z

@jnothman Added a parameter in the function which takes care of the actual start time of GridSearchCV. Please see if this is valid or not.

jnothman

It's not a bad approach, but I suspect it won't receive the consensus to get merge, just because it makes the _fit_and_score interface more messy...

@amueller, what do you think of this approach?

jnothman · 2017-09-03T11:13:48Z

sklearn/model_selection/_validation.py

            train_scores = _score(estimator, X_train, y_train, scorer,
                                  is_multimetric)
+            score_train_time = time.time() - start_time - fit_time - score_time
+            if score_train_time >= 0.1*fit_time and time.time() - grid_search_start_time> 5:


This still may mean that training score calculation time took only .5 of a second, which I don't think is quite enough. Maybe better would be to (naively) estimate overall training score time: (time.time() - grid_search_start_time) * score_train_time / (fit_time + score_time + score_train_time) > 5..?

thechargedneutron · 2017-09-03T11:21:06Z

@jnothman I also agree upon the fact that it makes the _fit_and_score method more messy. But I could not find and alternative to keep track of total time that GridSearchCV would take. You or @amueller may suggest a way which serves the purpose without changing _fit_and_score interface.

jnothman · 2017-09-03T11:25:48Z

The only other way to do it without changing the _fit_and_score interface is to only raise the warning after fitting all estimators. And if we're going to change _fit_and_score's interface, this is a good solution.

…

On 3 September 2017 at 21:21, Kumar Ashutosh ***@***.***> wrote: @jnothman <https://github.com/jnothman> I also agree upon the fact that it makes the _fit_and_score method more messy. But I could not find and alternative to keep track of total time that GridSearchCV would take. You or @amueller <https://github.com/amueller> may suggest a way which serves the purpose without changing _fit_and_score interface. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9677 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz65tS2hXIxc9O6p11nNoBA8BmsN8Kks5seoukgaJpZM4PLCkj> .

thechargedneutron · 2017-09-04T19:57:26Z

@amueller @lesteve Any suggestions on whether I should go on to change _fit_and_score method interface or would it be better to raise the warning only after the call to the function _fit_and_score ends?

thechargedneutron · 2017-09-06T21:09:31Z

@jnothman Should I continue with changing the function interface or raise warning only after completing _fit_and_score call ?

jnothman · 2017-09-07T01:20:56Z

I hope we'll get someone else's opinion soon... I suspect we'll land up with warning at the end, but I'm not sure.

…

lesteve · 2017-09-07T10:06:58Z

What about the seemingly simpler solution of changing the default to return_train_score=False, potentially by adding a FutureWarning that the default is going to be return_train_score=False in 0.22?

thechargedneutron · 2017-09-07T11:47:20Z

@lesteve Yes, this is also a simple solution, will be done once others agree upon this.

jnothman · 2017-09-07T13:31:54Z

I suppose that's an acceptable solution.

…

On 7 September 2017 at 21:47, Kumar Ashutosh ***@***.***> wrote: @lesteve <https://github.com/lesteve> Yes, this is also a simple solution, will be done once others agree upon this. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9677 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz63eCrLSWgNrp09pyhuLx8FrI2W7Nks5sf9fKgaJpZM4PLCkj> .

…into searchcv

thechargedneutron · 2017-09-07T14:33:46Z

@jnothman Kindly review and suggest a suitable FutureWarning message.

thechargedneutron · 2017-09-07T19:34:15Z

Suggestions for a "nicer" warning message. Otherwise, I think this will work. :)

thechargedneutron · 2017-10-12T05:40:37Z

Even after doing the required changes (change in test still not made), the following lines of code does not produce a deprecation warning, is it how DeprecationDict is supposed to behave or I am missing out something.

from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV
iris = datasets.load_iris()
parameters = {'kernel':('linear','rbf'), 'C':[1,10]}
svc = svm.SVC()
clf = GridSearchCV(svc, parameters)
clf.fit(iris.data, iris.target)
sorted(clf.cv_results_.keys())

lesteve · 2017-10-12T09:50:57Z

Warning should be produced when accessing the cv_results_ key for training scores:

from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV
iris = datasets.load_iris()
parameters = {'kernel':('linear','rbf'), 'C':[1,10]}
svc = svm.SVC()
clf = GridSearchCV(svc, parameters)
clf.fit(iris.data, iris.target)
clf.cv_results_['split0_train_score']  # this is the line that should produce a warning

Note that you should not get any warning if you use return_train_score=True.

from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV
iris = datasets.load_iris()
parameters = {'kernel':('linear','rbf'), 'C':[1,10]}
svc = svm.SVC()
clf = GridSearchCV(svc, parameters, return_train_score=True)
clf.fit(iris.data, iris.target)
clf.cv_results_['split0_train_score']  # no warning because return_train_score is set to True

Basically what we want users who do not set return_train_score and access cv results train score to get a warning and tell them they should set return_train_score to True because training scores will not be present by default in 0.20.

thechargedneutron · 2017-10-12T12:05:46Z

I am not sure how DeprecationDict is working. Need help on how to implement.

lesteve · 2017-10-13T09:24:07Z

You need to help us help you!

Can you be specific about your problem is? what have you tried? If there is a failure that you don't understand, can you copy and paste it here?

jnothman · 2017-10-16T02:19:53Z

See my PR at thechargedneutron#2

…into searchcv

lesteve · 2017-10-16T09:18:35Z

I restarted the failing build in Travis hoping the timeout was just a glitch.

jnothman · 2017-10-16T09:22:46Z

Do you think this approach is decent, @lesteve? Too messy? It implies that a user no longer gets a warning informing them why the fit is so slow which was the original point...!

lesteve · 2017-10-16T09:36:04Z

I have not looked at the diff of this PR (will do shortly). To be perfectly honest, I think your DeprecationDict suggestions is really neat and this is the best we can do:

we tried to have a warning only if the scoring for train was slow and we failed because it was a bit messy to implement. Personally I think return_train_score=False is a reasonable default (compute more things only if asked by the user).
I think the DeprecationDict approach is the best way to transition from return_train_score=True to return_train_score=False. People will get the warning if they do not set return_train_score and access and then they can decide what they really want.
In 0.21 return_train_score=False will be the default so the code will be fast by default.

lesteve · 2017-10-17T08:39:55Z

sklearn/model_selection/_search.py

+                            "validation significantly. This is the reason "
+                            "return_train_score will change its default value "
+                            "from True (current behaviour) to False in 0.21. "
+                            "Please set return_train_score explicitly to get "


I think this message should be changed slightly to say something along the lines of "Looks like you are using training scores, you want to set return_train_score=True because return_train_score default will change in 0.21"

Make docstrings more uniform

lesteve · 2017-10-17T09:43:51Z

@jnothman I pushed two main changes, it would be nice to have your opinion on these:

I simplified the warning message, since it only happens when accessing a training score
DeprecationDict is only used when return_train_score='warn'. This is me being overly cautious/pessimistic mainly and preferring to use a plain dict when return_train_score is set explicitly. I can revert this if you feel this is too much.

I am going to reset the LGTM count and add a +1 from me.

amueller · 2017-10-17T20:27:47Z

sklearn/model_selection/tests/test_search.py

+            'which will not be available by default '
+            'any more in 0.21. If you need training scores, '
+            'please set return_train_score=True').format(key)
+        train_score = assert_warns_message(FutureWarning, msg,


shouldn't we assert that there is no warning for the other vals and that there is no warning for the other keys for 'warn'?

Otherwise LGTM.

jnothman · 2017-10-17T23:39:01Z

Thanks all

…-learn#9677)

thechargedneutron added 2 commits September 3, 2017 05:00

Initial commit

cb0cabc

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

ee82d80

…into searchcv

jnothman reviewed Sep 3, 2017

View reviewed changes

thechargedneutron added 3 commits September 3, 2017 13:12

grid_search_start_time variable added

9c89621

doc changed

fe604ce

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

7a1cfb7

…into searchcv

jnothman reviewed Sep 3, 2017

View reviewed changes

thechargedneutron added 4 commits September 7, 2017 19:18

changes reverted

42e32f2

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

1a06464

…into searchcv

Default od return_train_score changed to False

da7b680

FutureWarning added

4aa929a

thechargedneutron added 5 commits September 7, 2017 22:16

Corresponding tests changed

2a7d5c4

missing changes done

2002cc3

missing return_train_score added

5e7b883

extra , removed

a54b2f0

space added

9f96446

thechargedneutron changed the title ~~[WIP] Adds warning in GridSearchCV if calculating train score is unduly expensive.~~ [MRG] Adds warning in GridSearchCV if calculating train score is unduly expensive. Sep 7, 2017

jnothman changed the title ~~[MRG] Adds warning in GridSearchCV if calculating train score is unduly expensive.~~ [MRG] Changes default for return_train_score to False Sep 8, 2017

pep8 failures removed

65c03a6

jnothman and others added 3 commits October 16, 2017 10:14

return_train_score='warn' with DeprecationDict (#2)

e7ec068

pep8 failures removed

bbf1dfd

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

1209dca

…into searchcv

lesteve reviewed Oct 17, 2017

View reviewed changes

lesteve added 2 commits October 17, 2017 11:04

Use DeprecationDict only when return_train_score="warn"

75e22b6

Make docstrings more uniform

Improve warning

c11133a

lesteve changed the title ~~[MRG+2?] Changes default for return_train_score to False~~ [MRG+1] Changes default for return_train_score to False Oct 17, 2017

amueller reviewed Oct 17, 2017

View reviewed changes

Test other keys are accessible without warning

1e844bd

jnothman merged commit 766ba93 into scikit-learn:master Oct 17, 2017

jnothman added a commit that referenced this pull request Oct 17, 2017

Add DeprecationDict for #9677

688950e

jnothman added a commit to jnothman/scikit-learn that referenced this pull request Oct 17, 2017

Add DeprecationDict for scikit-learn#9677

754f73a

jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Oct 17, 2017

[MRG+1] DEPREC Change default for return_train_score to False (scikit…

83411db

…-learn#9677)

thechargedneutron deleted the searchcv branch October 18, 2017 06:18

maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

Add DeprecationDict for scikit-learn#9677

dffe362

maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

[MRG+1] DEPREC Change default for return_train_score to False (scikit…

a0d477e

…-learn#9677)

qinhanmin2014 mentioned this pull request Dec 6, 2017

[MRG+1] GridSearchCV iid #9379

Merged

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

Add DeprecationDict for scikit-learn#9677

785c121

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

[MRG+1] DEPREC Change default for return_train_score to False (scikit…

c72603e

…-learn#9677)

Uh oh!

Conversation

thechargedneutron commented Sep 2, 2017

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

thechargedneutron commented Sep 2, 2017

Uh oh!

jnothman Sep 3, 2017

Choose a reason for hiding this comment

Uh oh!

thechargedneutron Sep 3, 2017

Choose a reason for hiding this comment

Uh oh!

thechargedneutron commented Sep 3, 2017

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

jnothman Sep 3, 2017

Choose a reason for hiding this comment

Uh oh!

thechargedneutron commented Sep 3, 2017

Uh oh!

jnothman commented Sep 3, 2017 via email

Uh oh!

thechargedneutron commented Sep 4, 2017

Uh oh!

thechargedneutron commented Sep 6, 2017

Uh oh!

jnothman commented Sep 7, 2017 via email

Uh oh!

lesteve commented Sep 7, 2017

Uh oh!

thechargedneutron commented Sep 7, 2017

Uh oh!

jnothman commented Sep 7, 2017 via email

Uh oh!

thechargedneutron commented Sep 7, 2017

Uh oh!

thechargedneutron commented Sep 7, 2017

Uh oh!

thechargedneutron commented Oct 12, 2017

Uh oh!

lesteve commented Oct 12, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thechargedneutron commented Oct 12, 2017

Uh oh!

lesteve commented Oct 13, 2017

Uh oh!

jnothman commented Oct 16, 2017

Uh oh!

lesteve commented Oct 16, 2017

Uh oh!

jnothman commented Oct 16, 2017

Uh oh!

lesteve commented Oct 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lesteve Oct 17, 2017

Choose a reason for hiding this comment

Uh oh!

lesteve commented Oct 17, 2017

Uh oh!

amueller Oct 17, 2017

Choose a reason for hiding this comment

Uh oh!

jnothman commented Oct 17, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

lesteve commented Oct 12, 2017 •

edited

Loading

lesteve commented Oct 16, 2017 •

edited

Loading