MRG Training Score in Gridsearch#1742
Conversation
sklearn/grid_search.py
Outdated
There was a problem hiding this comment.
Why did you rename *_validation_score to *_test_score? validation sounds more correct in a CV setting. Don't you think?
There was a problem hiding this comment.
First I thought training and test build a nicer pair. Then I though validation would be better but didn't change it back. Will do once my slides are done ;)
There was a problem hiding this comment.
Alright as you wish I don't have any strong opinion on this either.
|
What about measuring the |
|
It's on the todo. Is there a better way than using |
|
I think |
|
Fixed doctests, rebased squashed. Should be good to go. |
examples/svm/plot_rbf_parameters.py
Outdated
There was a problem hiding this comment.
Is this a left-over of some experiment? It should be removed if it's not useful.
There was a problem hiding this comment.
Whoops. Actually I still need to have a look how it renders on the website.
|
Please add some smoke tests for the new tuple items: for instance check that all of them are positive and that train_score is lower than 1.0. |
|
Other than the above comments this looks good to me. |
|
Also added some tests. |
There was a problem hiding this comment.
" wide gap ... with the validation score"
|
See an alternative patch at https://github.com/jnothman/scikit-learn/tree/grid_search_more_info Note I have chosen different field names, aiming for consistency and memorability, if not preciseness of name. |
|
@jnothman btw, does your version work with lists of dicts as |
|
I don't think it's better, but it's certainly no worse: it provides exactly the same ordering according to It doesn't do anything particular to PR forthcoming. |
|
On 03/13/2013 01:07 AM, jnothman wrote:
|
add docstring for GridSearchCV, RandomizedSearchCV and fit_grid_point. In "fit_grid_point" I used test_score rather than validation_score, as the split is given to the function. rbf svm grid search example now also shows training scores - which illustrates overfitting for high C, and training/prediction times... which pasically serve to illustrate that this is possible. Maybe random forests would be better to evaluate training times?
This PR adds training scores to the GridSearchCV output, as wished for by @ogrisel.