[MRG + 2] ENH Allow `cross_val_score`, `GridSearchCV` et al. to evaluate on multiple metrics by raghavrv · Pull Request #7388 · scikit-learn/scikit-learn

raghavrv · 2016-09-11T23:39:29Z

Supercedes #2759
Fixes #1837

TODO

Currently, in master

scoring can only be a single string ('precision' etc) or a single callable (make_scorer(precision_score), custom_scorer).

In this PR

scoring can now be a list/tuple like ('precision', 'accuracy'...) or a dict like {'precision': make_scorer(precision_score), 'accuracy score': 'accuracy', 'custom': custom_scorer_callable}
If (and only if) the scoring is of multimetric type, the return of cross_val_score / learning_curve / validation_curve will be dict mapping scorer_names to their corresponding train_scores or test_scores.
GridSearchCV's attributes best_index_, best_params_, best_score_ will correspond to the metric set at refit param. If refit is simply True an error is raised.
GridSearchCV's cv_result_ attribute will consist of keys ending with scorer names for multiple metrics...

A sample plot on multiple metric search for min_samples_split in dtc (click on the plot to go to the example hosted at circle ci)

cc: @jnothman @amueller @vene

jnothman · 2016-09-12T03:07:25Z

(If it's focused on cross_val_score then it doesn't supersede #2579...)

I've been thinking of a function cross_validate that returns a dict like GridSearchCV.cv_results_, but with each value a scalar. Perhaps a parameter would switch it to returning the split results as an array. The same functionality could instead be rolled into cross_val_score, but I haven't yet deeply considered the benefits of either approach.

raghavrv · 2016-09-12T09:50:44Z

I've been thinking of a function cross_validate that returns a dict like GridSearchCV.cv_results_

For a single param evaluation, I think it's easier to simply call mean/std directly on the return value of cross_val_score...

jnothman · 2016-09-12T11:10:13Z

Yes, but I mean to also get times, training score, multiple param results,
etc.

On 12 September 2016 at 19:50, Raghav RV notifications@github.com wrote:

I've been thinking of a function cross_validate that returns a dict like
GridSearchCV.cv_results_

For a single param evaluation, I think it's easier to simply call mean/std
directly on the return value of cross_val_score...

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7388 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz65vvX4y7jyLTJasCJTeI4OjEMvItks5qpSB2gaJpZM4J6Iwm
.

jnothman · 2016-09-12T11:10:17Z

not multiple param, multiple metric

On 12 September 2016 at 21:10, Joel Nothman joel.nothman@gmail.com wrote:

Yes, but I mean to also get times, training score, multiple param results,
etc.

On 12 September 2016 at 19:50, Raghav RV notifications@github.com wrote:

I've been thinking of a function cross_validate that returns a dict like
GridSearchCV.cv_results_

For a single param evaluation, I think it's easier to simply call
mean/std directly on the return value of cross_val_score...

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7388 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz65vvX4y7jyLTJasCJTeI4OjEMvItks5qpSB2gaJpZM4J6Iwm
.

raghavrv · 2016-09-12T11:42:33Z

Maybe as a separate function cross_validate as rolling it into cross_val_score will complicate the common man's use case? (I believe not everyone wants multiple metric support?)

Thoughts @vene @amueller @agramfort

I thought we could simply have

scoring as a list of predefined metric strings / dict of names --> scorers. - The scores will now be a dict of names --> scores
scoring as a single string / callable - The scores will be, like before, an array.

agramfort · 2016-09-12T11:50:23Z

can you write a usage script the way you see? code snippets make things really concrete

raghavrv · 2016-09-15T13:35:06Z

Thanks for the comment @agramfort. I will post a sample script soon.

raghavrv · 2016-09-16T10:10:32Z

And @GaelVaroquaux thanks for the comment at #7435. Could you clarify what kind of output you have in mind for cross_val_score when multiple metrics are to be evaluated, if not a dict?

@agramfort this is the usage I had in mind -

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.tree import DecisionTreeRegressor
>>> from sklearn.datasets import make_regression

>>> dtc = DecisionTreeRegressor()
>>> X, y = make_regression(n_samples=100, random_state=42)


# For multiple metric - as list of metrics
>>> cross_val_score(dtc, X, y, cv=2, scoring=['neg_mean_absolute_error',
...                                           'neg_mean_squared_error',
...                                           'neg_median_absolute_error'])
{'neg_mean_absolute_error': array([-109.20020926, -124.05659102]),
 'neg_mean_squared_error': array([-15507.92864917, -27689.6700291 ]),
 'neg_median_absolute_error': array([ -87.57322795, -117.34946122])}

# For multiple metric - as dict of callables
>>> cross_val_score(dtc, X, y, cv=2,
...                 scoring={'neg_mean_absolute_error': neg_uae_scorer,
...                          'neg_mean_squared_error': neg_mse_scorer,
...                          'neg_median_absolute_error': neg_mae_scorer})
{'neg_mean_absolute_error': array([-109.20020926, -124.05659102]),
 'neg_mean_squared_error': array([-15507.92864917, -27689.6700291 ]),
 'neg_median_absolute_error': array([ -87.57322795, -117.34946122])}


# For single metric (like before)
>>> cross_val_score(dtc, X, y, cv=2, scoring='neg_mean_absolute_error')
array([-109.20020926, -124.05659102])

raghavrv · 2016-09-16T11:29:19Z

@mblondel WDYT?

amueller · 2016-09-16T15:32:21Z

Ah, for this usecase I actually support a dict. It's a bit weird if the output type changes depending on whether you provide a single metric or not, though.
Again, I think some data format that is easily converted to a pandas dataframe is great.

For callables, couldn't we just use __name__ instead of a dict? Or is that not stable enough?

amueller · 2016-09-16T15:34:19Z

So @jnothman suggested introducing a new function, and I think that might be a good idea. Optionally we could deprecate the current behavior of cross_val_score.

I think the new output should be structured like the cv_results_ with metrics and folds and times and summary statistics.

raghavrv · 2016-09-16T15:42:48Z

For callables, couldn't we just use name instead of a dict? Or is that not stable enough?

Two functions can have same name. We discussed this when we were brewing the cv_results_... This way we don't have to do complex heuristics to figure out the name of the scorer. We simply let the user supply the scorer name...

raghavrv · 2016-09-16T15:50:47Z

So @jnothman suggested introducing a new function, and I think that might be a good idea.

cross_validate? Which returns something similar to cv_results_? Okay! any opposition to this from @agramfort or @GaelVaroquaux?

Optionally we could deprecate the current behavior of cross_val_score.

Were you suggesting that cross_val_score also return a dict?

The list of scores as returned by cross_val_score for single metric will still be the most common use case... When people use multiple metric then they should definitely be expected to check the docstring to know how the scores for different metrics will be returned... correct?

Can I suggest that we leave cross_val_score as such (without implementing multiple metric there) and let it remain as a quick easy way to cross-validate for single metric and like Joel suggested cross_validate which will return a dict like cv_result_? There we can easily support multiple metric...

amueller · 2016-09-16T15:53:09Z

Two functions can have same name. We discussed this when we were brewing the cv_results_... This way we don't have to do complex heuristics to figure out the name of the scorer. We simply let the user supply the scorer name...

Sorry I missed that. But there is no multiple metric in GridSearchCV yet, right? So this PR would introduce the "scoring parameter as dict" as an interface.

raghavrv · 2016-09-16T15:55:28Z

But there is no multiple metric in GridSearchCV yet, right?

Not yet. Implementing there is very straight forward given our new cv_results_ attr...

But before that we need to fix on _fit_and_score and cross_val_score. They are the time consuming part involving API discussion...

amueller · 2016-09-16T21:03:51Z

Hm so do we also want to support f1_score with averaging=None in this? When doing grid-search, what would be used to decide the maximum, then? Hopefully not the first class.

raghavrv · 2016-09-17T21:51:21Z

No we won't support f1_score without averaging. For all such multiclass scorers you will get an error (as before).

ValueError: multiclass format is not supported

If the user wants it, they can quickly wrap the individual scorers into separate scorers with single value output each... In which case each such scorer will have a ranking associated with it...

And the best_estimator_ / best_index_ / best_score_ all would also have to be a dict with {scorer_name --> val}...

EDIT For single metric, the current format of best_estimator_ / best_index_ / best_score_ is all preserved as such...

jnothman · 2016-09-17T22:52:29Z

In short, we can't design cross_val_score in isolation. Make it work for GridSearchCV then we'll adapt it to cross_val_score. A dict doens't naturally specify one metric as score.

raghavrv · 2016-09-26T16:51:23Z

(This is waiting for ~~#7026~~#7325 to be merged)

jnothman · 2016-09-29T00:33:08Z

Please in 0.19. Please please please. While I have monkey patched this in my own code, I've fixed a colleague's code by simply avoiding cross_val_score altogether...

raghavrv · 2016-09-29T09:48:02Z

Please in 0.19. Please please please

Sure :P I thought 0.18 milestone is already complete with the timing and training score added? I intended this for ~~0.18~~0.19 only...

raghavrv · 2016-09-29T09:48:19Z

*0.19 only

MechCoder · 2017-07-08T16:13:20Z

Great work Raghav!

raghavrv · 2017-07-08T17:29:38Z

OMG OMG OMG. I can't believe this is finally merged. Thanks everyone for the reviews!!! @vene @amueller - My dear mentors, hereby I successfully finish my GSoC 2015 :') :p

raghavrv · 2017-07-08T17:30:07Z

@raghavrv can you open a new PR for that?

Sure :) Sorry I was flying to Austin!

amueller · 2017-07-10T00:37:31Z

@raghavrv see you tomorrow :)

raghavrv · 2017-07-10T22:00:50Z

I think @amueller is suggesting you use this kind of instructive wording in the narrative docs. Perhaps just adopt his wording?

Where is it? I can't find it.

@jnothman

…ate on multiple metrics (scikit-learn#7388) * ENH cross_val_score now supports multiple metrics * DOCFIX permutation_test_score * ENH validate multiple metric scorers * ENH Move validation of multimetric scoring param out * ENH GridSearchCV and RandomizedSearchCV now support multiple metrics * EXA Add an example demonstrating the multiple metric in GridSearchCV * ENH Let check_multimetric_scoring tell if its multimetric or not * FIX For single metric name of scorer should remain 'score' * ENH validation_curve and learning_curve now support multiple metrics * MNT move _aggregate_score_dicts helper into _validation.py * TST More testing/ Fixing scores to the correct values * EXA Add cross_val_score to multimetric example * Rename to multiple_metric_evaluation.py * MNT Remove scaffolding * FIX doctest imports * FIX wrap the scorer and unwrap the score when using _score() in rfe * TST Cleanup the tests. Test for is_multimetric too * TST Make sure it registers as single metric when scoring is of that type * PEP8 * Don't use dict comprehension to make it work in python2.6 * ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation * FIX+TST delegated methods NA when multimetric is enabled... TST Add general tests to GridSearchCV and RandomizedSearchCV * ENH add option to disable delegation on multimetric scoring * Remove old function from __all__ * flake8 * FIX revert disable_on_multimetric * stash * Fix incorrect rebase * [ci skip] * Make sure refit works as expected and remove irrelevant tests * Allow passing standard scorers by name in multimetric scorers * Fix example * flake8 * Address reviews * Fix indentation * Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs * Test that for single metric, 'score' is a key * Typos * Fix incorrect rebase * Compare multimetric grid search with multiple single metric searches * Test X, y list and pandas input; Test multimetric for unsupervised grid search * Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged * Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators * Add example to grid_search.rst * Use the classic tuning of C param in SVM instead of estimators in RF * FIX Remove scoring arg in deafult scorer test * flake8 * Search for min_samples_split in DTC; Also show f-score * REVIEW Make check_multimetric_scoring private * FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed * REVIEW Plot best score; Shorten legends * REVIEW/COSMIT multimetric --> multi-metric * REVIEW Mark the best scores of P/R scores too * Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed" This reverts commit ba766d9. * ENH Use looping for iid testing * FIX use param grid as scipy's stats dist in 0.12 do not accept seed * ENH more looping less code; Use small non-noisy dataset * FIX Use named arg after expanded args * TST More testing of the refit parameter * Test that in multimetric search refit to single metric, the delegated methods work as expected. * Test that setting probability=False works with multimetric too * Test refit=False gives sensible error * COSMIT multimetric --> multi-metric * REV Correct example doc * COSMIT * REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer * REVIEW refit param: Raise for empty strings * TST Invalid refit params * REVIEW Use <scorer_name> alone; recall --> Recall * REV specify when we expect scorers to not be None * FLAKE8 * REVERT multimetrics in learning_curve and validation_curve * REVIEW Simpler coding style * COSMIT * COSMIT * REV Compress example a bit. Move comment to top * FIX fit_grid_point's previous API must be preserved * Flake8 * TST Use loop; Compare with single-metric * REVIEW Use dict-comprehension instead of helper * REVIEW Remove redundant test * Fix tests incorrect braces * COSMIT * REVIEW Use regexp * REV Simplify aggregation of score dicts * FIX precision and accuracy test * FIX doctest and flake8 * TST the best_* attributes multimetric with single metric * Address @jnothman's review * Address more comments \o/ * DOCFIXES * Fix use the validated fit_param from fit's arguments * Revert alpha to a lower value as before * Using def instead of lambda * Address @jnothman's review batch 1: Fix tests / Doc fixes * Remove superfluous tests * Remove more superfluous testing * TST/FIX loop over refit and check found n_clusters * Cosmetic touches * Use zip instead of manually listing the keys * Fix inverse_transform * FIX bug in fit_grid_point; Allow only single score TST if fit_grid_point works as intended * ENH Use only ROC-AUC and F1-score * Fix typos and flake8; Address Andy's reviews MNT Add a comment on why we do such a transpose + some fixes * ENH Better error messages for incorrect multimetric scoring values +... ENH Avoid exception traceback while using incorrect scoring string * Dict keys must be of string type only * 1. Better error message for invalid scoring 2... Internal functions return single score for single metric scoring * Fix test failures and shuffle tests * Avoid wrapping scorer as dict in learning_curve * Remove doc example as asked for * Some leftover ones * Don't wrap scorer in validation_curve either * Add a doc example and skip it as dict order fails doctest * Import zip from six for python2.7 compat * Make cross_val_score return a cv_results-like dict * Add relevant sections to userguide * Flake8 fixes * Add whatsnew and fix broken links * Use AUC and accuracy instead of f1 * Fix failing doctests cross_validation.rst * DOC add the wrapper example for metrics that return multiple return values * Address andy's comments * Be less weird * Address more of andy's comments * Make a separate cross_validate function to return dict and a cross_val_score * Update the docs to reflect the new cross_validate function * Add cross_validate to toc-tree * Add more tests on type of cross_validate return and time limits * FIX failing doctests * FIX ensure keys are not plural * DOC fix * Address some pending comments * Remove the comment as it is irrelevant now * Remove excess blank line * Fix flake8 inconsistencies * Allow fit_times to be 0 to conform with windows precision * DOC specify how refit param is to be set in multiple metric case * TST ensure cross_validate works for string single metrics + address @jnothman's reviews * Doc fixes * Remove the shape and transform parameter of _aggregate_score_dicts * Address Joel's doc comments * Fix broken doctest * Fix the spurious file * Address Andy's comments * MNT Remove erroneous entry * Address Andy's comments * FIX broken links * Update whats_new.rst missing newline

Old refers to new tag added with PR scikit-learn#7388

Release 0.19b2 * tag '0.19b2': (808 commits) Preparing 0.19b2 [MRG+1] FIX out of bounds array access in SAGA (scikit-learn#9376) FIX make test_importances pass on 32 bit linux Release 0.19b1 DOC remove 'in dev' header in whats_new.rst DOC typos in whats_news.rst [ci skip] [MRG] DOC cleaning up what's new for 0.19 (scikit-learn#9252) FIX t-SNE memory usage and many other optimizer issues (scikit-learn#9032) FIX broken link in gallery and bad title rendering [MRG] DOC Replace \acute by prime (scikit-learn#9332) Fix typos (scikit-learn#9320) [MRG + 1 (rv) + 1 (alex) + 1] Add a check to test the docstring params and their order (scikit-learn#9206) DOC Residual sum vs. regression sum (scikit-learn#9314) [MRG] [HOTFIX] Fix capitalization in test and hence fix failing travis at master (scikit-learn#9317) More informative error message for classification metrics given regression output (scikit-learn#9275) [MRG] COSMIT Remove unused parameters in private functions (scikit-learn#9310) [MRG+1] Ridgecv normalize (scikit-learn#9302) [MRG + 2] ENH Allow `cross_val_score`, `GridSearchCV` et al. to evaluate on multiple metrics (scikit-learn#7388) Add data_home parameter to fetch_kddcup99 (scikit-learn#9289) FIX makedirs(..., exists_ok) not available in Python 2 (scikit-learn#9284) ...

* releases: (808 commits) Preparing 0.19b2 [MRG+1] FIX out of bounds array access in SAGA (scikit-learn#9376) FIX make test_importances pass on 32 bit linux Release 0.19b1 DOC remove 'in dev' header in whats_new.rst DOC typos in whats_news.rst [ci skip] [MRG] DOC cleaning up what's new for 0.19 (scikit-learn#9252) FIX t-SNE memory usage and many other optimizer issues (scikit-learn#9032) FIX broken link in gallery and bad title rendering [MRG] DOC Replace \acute by prime (scikit-learn#9332) Fix typos (scikit-learn#9320) [MRG + 1 (rv) + 1 (alex) + 1] Add a check to test the docstring params and their order (scikit-learn#9206) DOC Residual sum vs. regression sum (scikit-learn#9314) [MRG] [HOTFIX] Fix capitalization in test and hence fix failing travis at master (scikit-learn#9317) More informative error message for classification metrics given regression output (scikit-learn#9275) [MRG] COSMIT Remove unused parameters in private functions (scikit-learn#9310) [MRG+1] Ridgecv normalize (scikit-learn#9302) [MRG + 2] ENH Allow `cross_val_score`, `GridSearchCV` et al. to evaluate on multiple metrics (scikit-learn#7388) Add data_home parameter to fetch_kddcup99 (scikit-learn#9289) FIX makedirs(..., exists_ok) not available in Python 2 (scikit-learn#9284) ...

* dfsg: (808 commits) Preparing 0.19b2 [MRG+1] FIX out of bounds array access in SAGA (scikit-learn#9376) FIX make test_importances pass on 32 bit linux Release 0.19b1 DOC remove 'in dev' header in whats_new.rst DOC typos in whats_news.rst [ci skip] [MRG] DOC cleaning up what's new for 0.19 (scikit-learn#9252) FIX t-SNE memory usage and many other optimizer issues (scikit-learn#9032) FIX broken link in gallery and bad title rendering [MRG] DOC Replace \acute by prime (scikit-learn#9332) Fix typos (scikit-learn#9320) [MRG + 1 (rv) + 1 (alex) + 1] Add a check to test the docstring params and their order (scikit-learn#9206) DOC Residual sum vs. regression sum (scikit-learn#9314) [MRG] [HOTFIX] Fix capitalization in test and hence fix failing travis at master (scikit-learn#9317) More informative error message for classification metrics given regression output (scikit-learn#9275) [MRG] COSMIT Remove unused parameters in private functions (scikit-learn#9310) [MRG+1] Ridgecv normalize (scikit-learn#9302) [MRG + 2] ENH Allow `cross_val_score`, `GridSearchCV` et al. to evaluate on multiple metrics (scikit-learn#7388) Add data_home parameter to fetch_kddcup99 (scikit-learn#9289) FIX makedirs(..., exists_ok) not available in Python 2 (scikit-learn#9284) ...

* Fix Rouseeuw1984 broken link * Change label vbgmm to bgmm Previously modified with PR #6651 * Change tag name Old refers to new tag added with PR #7388 * Remove prefix underscore to match tag * Realign to fit 80 chars * Link to metrics.rst. pairwise metrics yet to be documented * Remove tag as LSHForest is deprecated * Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py. It is deprecated. * Fix few Sphinx warnings * Realign to 80 chars * Changes based on PR review * Remove unused ref in calibration * Fix link ref in covariance.rst * Fix linking issues * Differentiate Rouseeuw1999 tag within file. * Change all duplicate Rouseeuw1999 tags * Remove numbers from tag Rousseeuw

* Fix Rouseeuw1984 broken link * Change label vbgmm to bgmm Previously modified with PR scikit-learn#6651 * Change tag name Old refers to new tag added with PR scikit-learn#7388 * Remove prefix underscore to match tag * Realign to fit 80 chars * Link to metrics.rst. pairwise metrics yet to be documented * Remove tag as LSHForest is deprecated * Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py. It is deprecated. * Fix few Sphinx warnings * Realign to 80 chars * Changes based on PR review * Remove unused ref in calibration * Fix link ref in covariance.rst * Fix linking issues * Differentiate Rouseeuw1999 tag within file. * Change all duplicate Rouseeuw1999 tags * Remove numbers from tag Rousseeuw

@jnothman

…ate on multiple metrics (scikit-learn#7388) * ENH cross_val_score now supports multiple metrics * DOCFIX permutation_test_score * ENH validate multiple metric scorers * ENH Move validation of multimetric scoring param out * ENH GridSearchCV and RandomizedSearchCV now support multiple metrics * EXA Add an example demonstrating the multiple metric in GridSearchCV * ENH Let check_multimetric_scoring tell if its multimetric or not * FIX For single metric name of scorer should remain 'score' * ENH validation_curve and learning_curve now support multiple metrics * MNT move _aggregate_score_dicts helper into _validation.py * TST More testing/ Fixing scores to the correct values * EXA Add cross_val_score to multimetric example * Rename to multiple_metric_evaluation.py * MNT Remove scaffolding * FIX doctest imports * FIX wrap the scorer and unwrap the score when using _score() in rfe * TST Cleanup the tests. Test for is_multimetric too * TST Make sure it registers as single metric when scoring is of that type * PEP8 * Don't use dict comprehension to make it work in python2.6 * ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation * FIX+TST delegated methods NA when multimetric is enabled... TST Add general tests to GridSearchCV and RandomizedSearchCV * ENH add option to disable delegation on multimetric scoring * Remove old function from __all__ * flake8 * FIX revert disable_on_multimetric * stash * Fix incorrect rebase * [ci skip] * Make sure refit works as expected and remove irrelevant tests * Allow passing standard scorers by name in multimetric scorers * Fix example * flake8 * Address reviews * Fix indentation * Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs * Test that for single metric, 'score' is a key * Typos * Fix incorrect rebase * Compare multimetric grid search with multiple single metric searches * Test X, y list and pandas input; Test multimetric for unsupervised grid search * Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged * Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators * Add example to grid_search.rst * Use the classic tuning of C param in SVM instead of estimators in RF * FIX Remove scoring arg in deafult scorer test * flake8 * Search for min_samples_split in DTC; Also show f-score * REVIEW Make check_multimetric_scoring private * FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed * REVIEW Plot best score; Shorten legends * REVIEW/COSMIT multimetric --> multi-metric * REVIEW Mark the best scores of P/R scores too * Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed" This reverts commit ba766d9. * ENH Use looping for iid testing * FIX use param grid as scipy's stats dist in 0.12 do not accept seed * ENH more looping less code; Use small non-noisy dataset * FIX Use named arg after expanded args * TST More testing of the refit parameter * Test that in multimetric search refit to single metric, the delegated methods work as expected. * Test that setting probability=False works with multimetric too * Test refit=False gives sensible error * COSMIT multimetric --> multi-metric * REV Correct example doc * COSMIT * REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer * REVIEW refit param: Raise for empty strings * TST Invalid refit params * REVIEW Use <scorer_name> alone; recall --> Recall * REV specify when we expect scorers to not be None * FLAKE8 * REVERT multimetrics in learning_curve and validation_curve * REVIEW Simpler coding style * COSMIT * COSMIT * REV Compress example a bit. Move comment to top * FIX fit_grid_point's previous API must be preserved * Flake8 * TST Use loop; Compare with single-metric * REVIEW Use dict-comprehension instead of helper * REVIEW Remove redundant test * Fix tests incorrect braces * COSMIT * REVIEW Use regexp * REV Simplify aggregation of score dicts * FIX precision and accuracy test * FIX doctest and flake8 * TST the best_* attributes multimetric with single metric * Address @jnothman's review * Address more comments \o/ * DOCFIXES * Fix use the validated fit_param from fit's arguments * Revert alpha to a lower value as before * Using def instead of lambda * Address @jnothman's review batch 1: Fix tests / Doc fixes * Remove superfluous tests * Remove more superfluous testing * TST/FIX loop over refit and check found n_clusters * Cosmetic touches * Use zip instead of manually listing the keys * Fix inverse_transform * FIX bug in fit_grid_point; Allow only single score TST if fit_grid_point works as intended * ENH Use only ROC-AUC and F1-score * Fix typos and flake8; Address Andy's reviews MNT Add a comment on why we do such a transpose + some fixes * ENH Better error messages for incorrect multimetric scoring values +... ENH Avoid exception traceback while using incorrect scoring string * Dict keys must be of string type only * 1. Better error message for invalid scoring 2... Internal functions return single score for single metric scoring * Fix test failures and shuffle tests * Avoid wrapping scorer as dict in learning_curve * Remove doc example as asked for * Some leftover ones * Don't wrap scorer in validation_curve either * Add a doc example and skip it as dict order fails doctest * Import zip from six for python2.7 compat * Make cross_val_score return a cv_results-like dict * Add relevant sections to userguide * Flake8 fixes * Add whatsnew and fix broken links * Use AUC and accuracy instead of f1 * Fix failing doctests cross_validation.rst * DOC add the wrapper example for metrics that return multiple return values * Address andy's comments * Be less weird * Address more of andy's comments * Make a separate cross_validate function to return dict and a cross_val_score * Update the docs to reflect the new cross_validate function * Add cross_validate to toc-tree * Add more tests on type of cross_validate return and time limits * FIX failing doctests * FIX ensure keys are not plural * DOC fix * Address some pending comments * Remove the comment as it is irrelevant now * Remove excess blank line * Fix flake8 inconsistencies * Allow fit_times to be 0 to conform with windows precision * DOC specify how refit param is to be set in multiple metric case * TST ensure cross_validate works for string single metrics + address @jnothman's reviews * Doc fixes * Remove the shape and transform parameter of _aggregate_score_dicts * Address Joel's doc comments * Fix broken doctest * Fix the spurious file * Address Andy's comments * MNT Remove erroneous entry * Address Andy's comments * FIX broken links * Update whats_new.rst missing newline

* Fix Rouseeuw1984 broken link * Change label vbgmm to bgmm Previously modified with PR scikit-learn#6651 * Change tag name Old refers to new tag added with PR scikit-learn#7388 * Remove prefix underscore to match tag * Realign to fit 80 chars * Link to metrics.rst. pairwise metrics yet to be documented * Remove tag as LSHForest is deprecated * Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py. It is deprecated. * Fix few Sphinx warnings * Realign to 80 chars * Changes based on PR review * Remove unused ref in calibration * Fix link ref in covariance.rst * Fix linking issues * Differentiate Rouseeuw1999 tag within file. * Change all duplicate Rouseeuw1999 tags * Remove numbers from tag Rousseeuw

@jnothman

…ate on multiple metrics (scikit-learn#7388) * ENH cross_val_score now supports multiple metrics * DOCFIX permutation_test_score * ENH validate multiple metric scorers * ENH Move validation of multimetric scoring param out * ENH GridSearchCV and RandomizedSearchCV now support multiple metrics * EXA Add an example demonstrating the multiple metric in GridSearchCV * ENH Let check_multimetric_scoring tell if its multimetric or not * FIX For single metric name of scorer should remain 'score' * ENH validation_curve and learning_curve now support multiple metrics * MNT move _aggregate_score_dicts helper into _validation.py * TST More testing/ Fixing scores to the correct values * EXA Add cross_val_score to multimetric example * Rename to multiple_metric_evaluation.py * MNT Remove scaffolding * FIX doctest imports * FIX wrap the scorer and unwrap the score when using _score() in rfe * TST Cleanup the tests. Test for is_multimetric too * TST Make sure it registers as single metric when scoring is of that type * PEP8 * Don't use dict comprehension to make it work in python2.6 * ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation * FIX+TST delegated methods NA when multimetric is enabled... TST Add general tests to GridSearchCV and RandomizedSearchCV * ENH add option to disable delegation on multimetric scoring * Remove old function from __all__ * flake8 * FIX revert disable_on_multimetric * stash * Fix incorrect rebase * [ci skip] * Make sure refit works as expected and remove irrelevant tests * Allow passing standard scorers by name in multimetric scorers * Fix example * flake8 * Address reviews * Fix indentation * Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs * Test that for single metric, 'score' is a key * Typos * Fix incorrect rebase * Compare multimetric grid search with multiple single metric searches * Test X, y list and pandas input; Test multimetric for unsupervised grid search * Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged * Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators * Add example to grid_search.rst * Use the classic tuning of C param in SVM instead of estimators in RF * FIX Remove scoring arg in deafult scorer test * flake8 * Search for min_samples_split in DTC; Also show f-score * REVIEW Make check_multimetric_scoring private * FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed * REVIEW Plot best score; Shorten legends * REVIEW/COSMIT multimetric --> multi-metric * REVIEW Mark the best scores of P/R scores too * Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed" This reverts commit ba766d9. * ENH Use looping for iid testing * FIX use param grid as scipy's stats dist in 0.12 do not accept seed * ENH more looping less code; Use small non-noisy dataset * FIX Use named arg after expanded args * TST More testing of the refit parameter * Test that in multimetric search refit to single metric, the delegated methods work as expected. * Test that setting probability=False works with multimetric too * Test refit=False gives sensible error * COSMIT multimetric --> multi-metric * REV Correct example doc * COSMIT * REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer * REVIEW refit param: Raise for empty strings * TST Invalid refit params * REVIEW Use <scorer_name> alone; recall --> Recall * REV specify when we expect scorers to not be None * FLAKE8 * REVERT multimetrics in learning_curve and validation_curve * REVIEW Simpler coding style * COSMIT * COSMIT * REV Compress example a bit. Move comment to top * FIX fit_grid_point's previous API must be preserved * Flake8 * TST Use loop; Compare with single-metric * REVIEW Use dict-comprehension instead of helper * REVIEW Remove redundant test * Fix tests incorrect braces * COSMIT * REVIEW Use regexp * REV Simplify aggregation of score dicts * FIX precision and accuracy test * FIX doctest and flake8 * TST the best_* attributes multimetric with single metric * Address @jnothman's review * Address more comments \o/ * DOCFIXES * Fix use the validated fit_param from fit's arguments * Revert alpha to a lower value as before * Using def instead of lambda * Address @jnothman's review batch 1: Fix tests / Doc fixes * Remove superfluous tests * Remove more superfluous testing * TST/FIX loop over refit and check found n_clusters * Cosmetic touches * Use zip instead of manually listing the keys * Fix inverse_transform * FIX bug in fit_grid_point; Allow only single score TST if fit_grid_point works as intended * ENH Use only ROC-AUC and F1-score * Fix typos and flake8; Address Andy's reviews MNT Add a comment on why we do such a transpose + some fixes * ENH Better error messages for incorrect multimetric scoring values +... ENH Avoid exception traceback while using incorrect scoring string * Dict keys must be of string type only * 1. Better error message for invalid scoring 2... Internal functions return single score for single metric scoring * Fix test failures and shuffle tests * Avoid wrapping scorer as dict in learning_curve * Remove doc example as asked for * Some leftover ones * Don't wrap scorer in validation_curve either * Add a doc example and skip it as dict order fails doctest * Import zip from six for python2.7 compat * Make cross_val_score return a cv_results-like dict * Add relevant sections to userguide * Flake8 fixes * Add whatsnew and fix broken links * Use AUC and accuracy instead of f1 * Fix failing doctests cross_validation.rst * DOC add the wrapper example for metrics that return multiple return values * Address andy's comments * Be less weird * Address more of andy's comments * Make a separate cross_validate function to return dict and a cross_val_score * Update the docs to reflect the new cross_validate function * Add cross_validate to toc-tree * Add more tests on type of cross_validate return and time limits * FIX failing doctests * FIX ensure keys are not plural * DOC fix * Address some pending comments * Remove the comment as it is irrelevant now * Remove excess blank line * Fix flake8 inconsistencies * Allow fit_times to be 0 to conform with windows precision * DOC specify how refit param is to be set in multiple metric case * TST ensure cross_validate works for string single metrics + address @jnothman's reviews * Doc fixes * Remove the shape and transform parameter of _aggregate_score_dicts * Address Joel's doc comments * Fix broken doctest * Fix the spurious file * Address Andy's comments * MNT Remove erroneous entry * Address Andy's comments * FIX broken links * Update whats_new.rst missing newline

* Fix Rouseeuw1984 broken link * Change label vbgmm to bgmm Previously modified with PR scikit-learn#6651 * Change tag name Old refers to new tag added with PR scikit-learn#7388 * Remove prefix underscore to match tag * Realign to fit 80 chars * Link to metrics.rst. pairwise metrics yet to be documented * Remove tag as LSHForest is deprecated * Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py. It is deprecated. * Fix few Sphinx warnings * Realign to 80 chars * Changes based on PR review * Remove unused ref in calibration * Fix link ref in covariance.rst * Fix linking issues * Differentiate Rouseeuw1999 tag within file. * Change all duplicate Rouseeuw1999 tags * Remove numbers from tag Rousseeuw

@jnothman

…ate on multiple metrics (scikit-learn#7388) * ENH cross_val_score now supports multiple metrics * DOCFIX permutation_test_score * ENH validate multiple metric scorers * ENH Move validation of multimetric scoring param out * ENH GridSearchCV and RandomizedSearchCV now support multiple metrics * EXA Add an example demonstrating the multiple metric in GridSearchCV * ENH Let check_multimetric_scoring tell if its multimetric or not * FIX For single metric name of scorer should remain 'score' * ENH validation_curve and learning_curve now support multiple metrics * MNT move _aggregate_score_dicts helper into _validation.py * TST More testing/ Fixing scores to the correct values * EXA Add cross_val_score to multimetric example * Rename to multiple_metric_evaluation.py * MNT Remove scaffolding * FIX doctest imports * FIX wrap the scorer and unwrap the score when using _score() in rfe * TST Cleanup the tests. Test for is_multimetric too * TST Make sure it registers as single metric when scoring is of that type * PEP8 * Don't use dict comprehension to make it work in python2.6 * ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation * FIX+TST delegated methods NA when multimetric is enabled... TST Add general tests to GridSearchCV and RandomizedSearchCV * ENH add option to disable delegation on multimetric scoring * Remove old function from __all__ * flake8 * FIX revert disable_on_multimetric * stash * Fix incorrect rebase * [ci skip] * Make sure refit works as expected and remove irrelevant tests * Allow passing standard scorers by name in multimetric scorers * Fix example * flake8 * Address reviews * Fix indentation * Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs * Test that for single metric, 'score' is a key * Typos * Fix incorrect rebase * Compare multimetric grid search with multiple single metric searches * Test X, y list and pandas input; Test multimetric for unsupervised grid search * Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged * Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators * Add example to grid_search.rst * Use the classic tuning of C param in SVM instead of estimators in RF * FIX Remove scoring arg in deafult scorer test * flake8 * Search for min_samples_split in DTC; Also show f-score * REVIEW Make check_multimetric_scoring private * FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed * REVIEW Plot best score; Shorten legends * REVIEW/COSMIT multimetric --> multi-metric * REVIEW Mark the best scores of P/R scores too * Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed" This reverts commit ba766d9. * ENH Use looping for iid testing * FIX use param grid as scipy's stats dist in 0.12 do not accept seed * ENH more looping less code; Use small non-noisy dataset * FIX Use named arg after expanded args * TST More testing of the refit parameter * Test that in multimetric search refit to single metric, the delegated methods work as expected. * Test that setting probability=False works with multimetric too * Test refit=False gives sensible error * COSMIT multimetric --> multi-metric * REV Correct example doc * COSMIT * REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer * REVIEW refit param: Raise for empty strings * TST Invalid refit params * REVIEW Use <scorer_name> alone; recall --> Recall * REV specify when we expect scorers to not be None * FLAKE8 * REVERT multimetrics in learning_curve and validation_curve * REVIEW Simpler coding style * COSMIT * COSMIT * REV Compress example a bit. Move comment to top * FIX fit_grid_point's previous API must be preserved * Flake8 * TST Use loop; Compare with single-metric * REVIEW Use dict-comprehension instead of helper * REVIEW Remove redundant test * Fix tests incorrect braces * COSMIT * REVIEW Use regexp * REV Simplify aggregation of score dicts * FIX precision and accuracy test * FIX doctest and flake8 * TST the best_* attributes multimetric with single metric * Address @jnothman's review * Address more comments \o/ * DOCFIXES * Fix use the validated fit_param from fit's arguments * Revert alpha to a lower value as before * Using def instead of lambda * Address @jnothman's review batch 1: Fix tests / Doc fixes * Remove superfluous tests * Remove more superfluous testing * TST/FIX loop over refit and check found n_clusters * Cosmetic touches * Use zip instead of manually listing the keys * Fix inverse_transform * FIX bug in fit_grid_point; Allow only single score TST if fit_grid_point works as intended * ENH Use only ROC-AUC and F1-score * Fix typos and flake8; Address Andy's reviews MNT Add a comment on why we do such a transpose + some fixes * ENH Better error messages for incorrect multimetric scoring values +... ENH Avoid exception traceback while using incorrect scoring string * Dict keys must be of string type only * 1. Better error message for invalid scoring 2... Internal functions return single score for single metric scoring * Fix test failures and shuffle tests * Avoid wrapping scorer as dict in learning_curve * Remove doc example as asked for * Some leftover ones * Don't wrap scorer in validation_curve either * Add a doc example and skip it as dict order fails doctest * Import zip from six for python2.7 compat * Make cross_val_score return a cv_results-like dict * Add relevant sections to userguide * Flake8 fixes * Add whatsnew and fix broken links * Use AUC and accuracy instead of f1 * Fix failing doctests cross_validation.rst * DOC add the wrapper example for metrics that return multiple return values * Address andy's comments * Be less weird * Address more of andy's comments * Make a separate cross_validate function to return dict and a cross_val_score * Update the docs to reflect the new cross_validate function * Add cross_validate to toc-tree * Add more tests on type of cross_validate return and time limits * FIX failing doctests * FIX ensure keys are not plural * DOC fix * Address some pending comments * Remove the comment as it is irrelevant now * Remove excess blank line * Fix flake8 inconsistencies * Allow fit_times to be 0 to conform with windows precision * DOC specify how refit param is to be set in multiple metric case * TST ensure cross_validate works for string single metrics + address @jnothman's reviews * Doc fixes * Remove the shape and transform parameter of _aggregate_score_dicts * Address Joel's doc comments * Fix broken doctest * Fix the spurious file * Address Andy's comments * MNT Remove erroneous entry * Address Andy's comments * FIX broken links * Update whats_new.rst missing newline

@jnothman

…ate on multiple metrics (scikit-learn#7388) * ENH cross_val_score now supports multiple metrics * DOCFIX permutation_test_score * ENH validate multiple metric scorers * ENH Move validation of multimetric scoring param out * ENH GridSearchCV and RandomizedSearchCV now support multiple metrics * EXA Add an example demonstrating the multiple metric in GridSearchCV * ENH Let check_multimetric_scoring tell if its multimetric or not * FIX For single metric name of scorer should remain 'score' * ENH validation_curve and learning_curve now support multiple metrics * MNT move _aggregate_score_dicts helper into _validation.py * TST More testing/ Fixing scores to the correct values * EXA Add cross_val_score to multimetric example * Rename to multiple_metric_evaluation.py * MNT Remove scaffolding * FIX doctest imports * FIX wrap the scorer and unwrap the score when using _score() in rfe * TST Cleanup the tests. Test for is_multimetric too * TST Make sure it registers as single metric when scoring is of that type * PEP8 * Don't use dict comprehension to make it work in python2.6 * ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation * FIX+TST delegated methods NA when multimetric is enabled... TST Add general tests to GridSearchCV and RandomizedSearchCV * ENH add option to disable delegation on multimetric scoring * Remove old function from __all__ * flake8 * FIX revert disable_on_multimetric * stash * Fix incorrect rebase * [ci skip] * Make sure refit works as expected and remove irrelevant tests * Allow passing standard scorers by name in multimetric scorers * Fix example * flake8 * Address reviews * Fix indentation * Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs * Test that for single metric, 'score' is a key * Typos * Fix incorrect rebase * Compare multimetric grid search with multiple single metric searches * Test X, y list and pandas input; Test multimetric for unsupervised grid search * Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged * Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators * Add example to grid_search.rst * Use the classic tuning of C param in SVM instead of estimators in RF * FIX Remove scoring arg in deafult scorer test * flake8 * Search for min_samples_split in DTC; Also show f-score * REVIEW Make check_multimetric_scoring private * FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed * REVIEW Plot best score; Shorten legends * REVIEW/COSMIT multimetric --> multi-metric * REVIEW Mark the best scores of P/R scores too * Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed" This reverts commit ba766d9. * ENH Use looping for iid testing * FIX use param grid as scipy's stats dist in 0.12 do not accept seed * ENH more looping less code; Use small non-noisy dataset * FIX Use named arg after expanded args * TST More testing of the refit parameter * Test that in multimetric search refit to single metric, the delegated methods work as expected. * Test that setting probability=False works with multimetric too * Test refit=False gives sensible error * COSMIT multimetric --> multi-metric * REV Correct example doc * COSMIT * REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer * REVIEW refit param: Raise for empty strings * TST Invalid refit params * REVIEW Use <scorer_name> alone; recall --> Recall * REV specify when we expect scorers to not be None * FLAKE8 * REVERT multimetrics in learning_curve and validation_curve * REVIEW Simpler coding style * COSMIT * COSMIT * REV Compress example a bit. Move comment to top * FIX fit_grid_point's previous API must be preserved * Flake8 * TST Use loop; Compare with single-metric * REVIEW Use dict-comprehension instead of helper * REVIEW Remove redundant test * Fix tests incorrect braces * COSMIT * REVIEW Use regexp * REV Simplify aggregation of score dicts * FIX precision and accuracy test * FIX doctest and flake8 * TST the best_* attributes multimetric with single metric * Address @jnothman's review * Address more comments \o/ * DOCFIXES * Fix use the validated fit_param from fit's arguments * Revert alpha to a lower value as before * Using def instead of lambda * Address @jnothman's review batch 1: Fix tests / Doc fixes * Remove superfluous tests * Remove more superfluous testing * TST/FIX loop over refit and check found n_clusters * Cosmetic touches * Use zip instead of manually listing the keys * Fix inverse_transform * FIX bug in fit_grid_point; Allow only single score TST if fit_grid_point works as intended * ENH Use only ROC-AUC and F1-score * Fix typos and flake8; Address Andy's reviews MNT Add a comment on why we do such a transpose + some fixes * ENH Better error messages for incorrect multimetric scoring values +... ENH Avoid exception traceback while using incorrect scoring string * Dict keys must be of string type only * 1. Better error message for invalid scoring 2... Internal functions return single score for single metric scoring * Fix test failures and shuffle tests * Avoid wrapping scorer as dict in learning_curve * Remove doc example as asked for * Some leftover ones * Don't wrap scorer in validation_curve either * Add a doc example and skip it as dict order fails doctest * Import zip from six for python2.7 compat * Make cross_val_score return a cv_results-like dict * Add relevant sections to userguide * Flake8 fixes * Add whatsnew and fix broken links * Use AUC and accuracy instead of f1 * Fix failing doctests cross_validation.rst * DOC add the wrapper example for metrics that return multiple return values * Address andy's comments * Be less weird * Address more of andy's comments * Make a separate cross_validate function to return dict and a cross_val_score * Update the docs to reflect the new cross_validate function * Add cross_validate to toc-tree * Add more tests on type of cross_validate return and time limits * FIX failing doctests * FIX ensure keys are not plural * DOC fix * Address some pending comments * Remove the comment as it is irrelevant now * Remove excess blank line * Fix flake8 inconsistencies * Allow fit_times to be 0 to conform with windows precision * DOC specify how refit param is to be set in multiple metric case * TST ensure cross_validate works for string single metrics + address @jnothman's reviews * Doc fixes * Remove the shape and transform parameter of _aggregate_score_dicts * Address Joel's doc comments * Fix broken doctest * Fix the spurious file * Address Andy's comments * MNT Remove erroneous entry * Address Andy's comments * FIX broken links * Update whats_new.rst missing newline

* Fix Rouseeuw1984 broken link * Change label vbgmm to bgmm Previously modified with PR scikit-learn#6651 * Change tag name Old refers to new tag added with PR scikit-learn#7388 * Remove prefix underscore to match tag * Realign to fit 80 chars * Link to metrics.rst. pairwise metrics yet to be documented * Remove tag as LSHForest is deprecated * Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py. It is deprecated. * Fix few Sphinx warnings * Realign to 80 chars * Changes based on PR review * Remove unused ref in calibration * Fix link ref in covariance.rst * Fix linking issues * Differentiate Rouseeuw1999 tag within file. * Change all duplicate Rouseeuw1999 tags * Remove numbers from tag Rousseeuw

@jnothman

…ate on multiple metrics (scikit-learn#7388) * ENH cross_val_score now supports multiple metrics * DOCFIX permutation_test_score * ENH validate multiple metric scorers * ENH Move validation of multimetric scoring param out * ENH GridSearchCV and RandomizedSearchCV now support multiple metrics * EXA Add an example demonstrating the multiple metric in GridSearchCV * ENH Let check_multimetric_scoring tell if its multimetric or not * FIX For single metric name of scorer should remain 'score' * ENH validation_curve and learning_curve now support multiple metrics * MNT move _aggregate_score_dicts helper into _validation.py * TST More testing/ Fixing scores to the correct values * EXA Add cross_val_score to multimetric example * Rename to multiple_metric_evaluation.py * MNT Remove scaffolding * FIX doctest imports * FIX wrap the scorer and unwrap the score when using _score() in rfe * TST Cleanup the tests. Test for is_multimetric too * TST Make sure it registers as single metric when scoring is of that type * PEP8 * Don't use dict comprehension to make it work in python2.6 * ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation * FIX+TST delegated methods NA when multimetric is enabled... TST Add general tests to GridSearchCV and RandomizedSearchCV * ENH add option to disable delegation on multimetric scoring * Remove old function from __all__ * flake8 * FIX revert disable_on_multimetric * stash * Fix incorrect rebase * [ci skip] * Make sure refit works as expected and remove irrelevant tests * Allow passing standard scorers by name in multimetric scorers * Fix example * flake8 * Address reviews * Fix indentation * Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs * Test that for single metric, 'score' is a key * Typos * Fix incorrect rebase * Compare multimetric grid search with multiple single metric searches * Test X, y list and pandas input; Test multimetric for unsupervised grid search * Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged * Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators * Add example to grid_search.rst * Use the classic tuning of C param in SVM instead of estimators in RF * FIX Remove scoring arg in deafult scorer test * flake8 * Search for min_samples_split in DTC; Also show f-score * REVIEW Make check_multimetric_scoring private * FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed * REVIEW Plot best score; Shorten legends * REVIEW/COSMIT multimetric --> multi-metric * REVIEW Mark the best scores of P/R scores too * Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed" This reverts commit ba766d9. * ENH Use looping for iid testing * FIX use param grid as scipy's stats dist in 0.12 do not accept seed * ENH more looping less code; Use small non-noisy dataset * FIX Use named arg after expanded args * TST More testing of the refit parameter * Test that in multimetric search refit to single metric, the delegated methods work as expected. * Test that setting probability=False works with multimetric too * Test refit=False gives sensible error * COSMIT multimetric --> multi-metric * REV Correct example doc * COSMIT * REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer * REVIEW refit param: Raise for empty strings * TST Invalid refit params * REVIEW Use <scorer_name> alone; recall --> Recall * REV specify when we expect scorers to not be None * FLAKE8 * REVERT multimetrics in learning_curve and validation_curve * REVIEW Simpler coding style * COSMIT * COSMIT * REV Compress example a bit. Move comment to top * FIX fit_grid_point's previous API must be preserved * Flake8 * TST Use loop; Compare with single-metric * REVIEW Use dict-comprehension instead of helper * REVIEW Remove redundant test * Fix tests incorrect braces * COSMIT * REVIEW Use regexp * REV Simplify aggregation of score dicts * FIX precision and accuracy test * FIX doctest and flake8 * TST the best_* attributes multimetric with single metric * Address @jnothman's review * Address more comments \o/ * DOCFIXES * Fix use the validated fit_param from fit's arguments * Revert alpha to a lower value as before * Using def instead of lambda * Address @jnothman's review batch 1: Fix tests / Doc fixes * Remove superfluous tests * Remove more superfluous testing * TST/FIX loop over refit and check found n_clusters * Cosmetic touches * Use zip instead of manually listing the keys * Fix inverse_transform * FIX bug in fit_grid_point; Allow only single score TST if fit_grid_point works as intended * ENH Use only ROC-AUC and F1-score * Fix typos and flake8; Address Andy's reviews MNT Add a comment on why we do such a transpose + some fixes * ENH Better error messages for incorrect multimetric scoring values +... ENH Avoid exception traceback while using incorrect scoring string * Dict keys must be of string type only * 1. Better error message for invalid scoring 2... Internal functions return single score for single metric scoring * Fix test failures and shuffle tests * Avoid wrapping scorer as dict in learning_curve * Remove doc example as asked for * Some leftover ones * Don't wrap scorer in validation_curve either * Add a doc example and skip it as dict order fails doctest * Import zip from six for python2.7 compat * Make cross_val_score return a cv_results-like dict * Add relevant sections to userguide * Flake8 fixes * Add whatsnew and fix broken links * Use AUC and accuracy instead of f1 * Fix failing doctests cross_validation.rst * DOC add the wrapper example for metrics that return multiple return values * Address andy's comments * Be less weird * Address more of andy's comments * Make a separate cross_validate function to return dict and a cross_val_score * Update the docs to reflect the new cross_validate function * Add cross_validate to toc-tree * Add more tests on type of cross_validate return and time limits * FIX failing doctests * FIX ensure keys are not plural * DOC fix * Address some pending comments * Remove the comment as it is irrelevant now * Remove excess blank line * Fix flake8 inconsistencies * Allow fit_times to be 0 to conform with windows precision * DOC specify how refit param is to be set in multiple metric case * TST ensure cross_validate works for string single metrics + address @jnothman's reviews * Doc fixes * Remove the shape and transform parameter of _aggregate_score_dicts * Address Joel's doc comments * Fix broken doctest * Fix the spurious file * Address Andy's comments * MNT Remove erroneous entry * Address Andy's comments * FIX broken links * Update whats_new.rst missing newline

* Fix Rouseeuw1984 broken link * Change label vbgmm to bgmm Previously modified with PR scikit-learn#6651 * Change tag name Old refers to new tag added with PR scikit-learn#7388 * Remove prefix underscore to match tag * Realign to fit 80 chars * Link to metrics.rst. pairwise metrics yet to be documented * Remove tag as LSHForest is deprecated * Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py. It is deprecated. * Fix few Sphinx warnings * Realign to 80 chars * Changes based on PR review * Remove unused ref in calibration * Fix link ref in covariance.rst * Fix linking issues * Differentiate Rouseeuw1999 tag within file. * Change all duplicate Rouseeuw1999 tags * Remove numbers from tag Rousseeuw

@jnothman

…ate on multiple metrics (scikit-learn#7388) * ENH cross_val_score now supports multiple metrics * DOCFIX permutation_test_score * ENH validate multiple metric scorers * ENH Move validation of multimetric scoring param out * ENH GridSearchCV and RandomizedSearchCV now support multiple metrics * EXA Add an example demonstrating the multiple metric in GridSearchCV * ENH Let check_multimetric_scoring tell if its multimetric or not * FIX For single metric name of scorer should remain 'score' * ENH validation_curve and learning_curve now support multiple metrics * MNT move _aggregate_score_dicts helper into _validation.py * TST More testing/ Fixing scores to the correct values * EXA Add cross_val_score to multimetric example * Rename to multiple_metric_evaluation.py * MNT Remove scaffolding * FIX doctest imports * FIX wrap the scorer and unwrap the score when using _score() in rfe * TST Cleanup the tests. Test for is_multimetric too * TST Make sure it registers as single metric when scoring is of that type * PEP8 * Don't use dict comprehension to make it work in python2.6 * ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation * FIX+TST delegated methods NA when multimetric is enabled... TST Add general tests to GridSearchCV and RandomizedSearchCV * ENH add option to disable delegation on multimetric scoring * Remove old function from __all__ * flake8 * FIX revert disable_on_multimetric * stash * Fix incorrect rebase * [ci skip] * Make sure refit works as expected and remove irrelevant tests * Allow passing standard scorers by name in multimetric scorers * Fix example * flake8 * Address reviews * Fix indentation * Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs * Test that for single metric, 'score' is a key * Typos * Fix incorrect rebase * Compare multimetric grid search with multiple single metric searches * Test X, y list and pandas input; Test multimetric for unsupervised grid search * Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged * Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators * Add example to grid_search.rst * Use the classic tuning of C param in SVM instead of estimators in RF * FIX Remove scoring arg in deafult scorer test * flake8 * Search for min_samples_split in DTC; Also show f-score * REVIEW Make check_multimetric_scoring private * FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed * REVIEW Plot best score; Shorten legends * REVIEW/COSMIT multimetric --> multi-metric * REVIEW Mark the best scores of P/R scores too * Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed" This reverts commit ba766d9. * ENH Use looping for iid testing * FIX use param grid as scipy's stats dist in 0.12 do not accept seed * ENH more looping less code; Use small non-noisy dataset * FIX Use named arg after expanded args * TST More testing of the refit parameter * Test that in multimetric search refit to single metric, the delegated methods work as expected. * Test that setting probability=False works with multimetric too * Test refit=False gives sensible error * COSMIT multimetric --> multi-metric * REV Correct example doc * COSMIT * REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer * REVIEW refit param: Raise for empty strings * TST Invalid refit params * REVIEW Use <scorer_name> alone; recall --> Recall * REV specify when we expect scorers to not be None * FLAKE8 * REVERT multimetrics in learning_curve and validation_curve * REVIEW Simpler coding style * COSMIT * COSMIT * REV Compress example a bit. Move comment to top * FIX fit_grid_point's previous API must be preserved * Flake8 * TST Use loop; Compare with single-metric * REVIEW Use dict-comprehension instead of helper * REVIEW Remove redundant test * Fix tests incorrect braces * COSMIT * REVIEW Use regexp * REV Simplify aggregation of score dicts * FIX precision and accuracy test * FIX doctest and flake8 * TST the best_* attributes multimetric with single metric * Address @jnothman's review * Address more comments \o/ * DOCFIXES * Fix use the validated fit_param from fit's arguments * Revert alpha to a lower value as before * Using def instead of lambda * Address @jnothman's review batch 1: Fix tests / Doc fixes * Remove superfluous tests * Remove more superfluous testing * TST/FIX loop over refit and check found n_clusters * Cosmetic touches * Use zip instead of manually listing the keys * Fix inverse_transform * FIX bug in fit_grid_point; Allow only single score TST if fit_grid_point works as intended * ENH Use only ROC-AUC and F1-score * Fix typos and flake8; Address Andy's reviews MNT Add a comment on why we do such a transpose + some fixes * ENH Better error messages for incorrect multimetric scoring values +... ENH Avoid exception traceback while using incorrect scoring string * Dict keys must be of string type only * 1. Better error message for invalid scoring 2... Internal functions return single score for single metric scoring * Fix test failures and shuffle tests * Avoid wrapping scorer as dict in learning_curve * Remove doc example as asked for * Some leftover ones * Don't wrap scorer in validation_curve either * Add a doc example and skip it as dict order fails doctest * Import zip from six for python2.7 compat * Make cross_val_score return a cv_results-like dict * Add relevant sections to userguide * Flake8 fixes * Add whatsnew and fix broken links * Use AUC and accuracy instead of f1 * Fix failing doctests cross_validation.rst * DOC add the wrapper example for metrics that return multiple return values * Address andy's comments * Be less weird * Address more of andy's comments * Make a separate cross_validate function to return dict and a cross_val_score * Update the docs to reflect the new cross_validate function * Add cross_validate to toc-tree * Add more tests on type of cross_validate return and time limits * FIX failing doctests * FIX ensure keys are not plural * DOC fix * Address some pending comments * Remove the comment as it is irrelevant now * Remove excess blank line * Fix flake8 inconsistencies * Allow fit_times to be 0 to conform with windows precision * DOC specify how refit param is to be set in multiple metric case * TST ensure cross_validate works for string single metrics + address @jnothman's reviews * Doc fixes * Remove the shape and transform parameter of _aggregate_score_dicts * Address Joel's doc comments * Fix broken doctest * Fix the spurious file * Address Andy's comments * MNT Remove erroneous entry * Address Andy's comments * FIX broken links * Update whats_new.rst missing newline

* Fix Rouseeuw1984 broken link * Change label vbgmm to bgmm Previously modified with PR scikit-learn#6651 * Change tag name Old refers to new tag added with PR scikit-learn#7388 * Remove prefix underscore to match tag * Realign to fit 80 chars * Link to metrics.rst. pairwise metrics yet to be documented * Remove tag as LSHForest is deprecated * Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py. It is deprecated. * Fix few Sphinx warnings * Realign to 80 chars * Changes based on PR review * Remove unused ref in calibration * Fix link ref in covariance.rst * Fix linking issues * Differentiate Rouseeuw1999 tag within file. * Change all duplicate Rouseeuw1999 tags * Remove numbers from tag Rousseeuw

@jnothman

…ate on multiple metrics (scikit-learn#7388) * ENH cross_val_score now supports multiple metrics * DOCFIX permutation_test_score * ENH validate multiple metric scorers * ENH Move validation of multimetric scoring param out * ENH GridSearchCV and RandomizedSearchCV now support multiple metrics * EXA Add an example demonstrating the multiple metric in GridSearchCV * ENH Let check_multimetric_scoring tell if its multimetric or not * FIX For single metric name of scorer should remain 'score' * ENH validation_curve and learning_curve now support multiple metrics * MNT move _aggregate_score_dicts helper into _validation.py * TST More testing/ Fixing scores to the correct values * EXA Add cross_val_score to multimetric example * Rename to multiple_metric_evaluation.py * MNT Remove scaffolding * FIX doctest imports * FIX wrap the scorer and unwrap the score when using _score() in rfe * TST Cleanup the tests. Test for is_multimetric too * TST Make sure it registers as single metric when scoring is of that type * PEP8 * Don't use dict comprehension to make it work in python2.6 * ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation * FIX+TST delegated methods NA when multimetric is enabled... TST Add general tests to GridSearchCV and RandomizedSearchCV * ENH add option to disable delegation on multimetric scoring * Remove old function from __all__ * flake8 * FIX revert disable_on_multimetric * stash * Fix incorrect rebase * [ci skip] * Make sure refit works as expected and remove irrelevant tests * Allow passing standard scorers by name in multimetric scorers * Fix example * flake8 * Address reviews * Fix indentation * Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs * Test that for single metric, 'score' is a key * Typos * Fix incorrect rebase * Compare multimetric grid search with multiple single metric searches * Test X, y list and pandas input; Test multimetric for unsupervised grid search * Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged * Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators * Add example to grid_search.rst * Use the classic tuning of C param in SVM instead of estimators in RF * FIX Remove scoring arg in deafult scorer test * flake8 * Search for min_samples_split in DTC; Also show f-score * REVIEW Make check_multimetric_scoring private * FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed * REVIEW Plot best score; Shorten legends * REVIEW/COSMIT multimetric --> multi-metric * REVIEW Mark the best scores of P/R scores too * Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed" This reverts commit ba766d9. * ENH Use looping for iid testing * FIX use param grid as scipy's stats dist in 0.12 do not accept seed * ENH more looping less code; Use small non-noisy dataset * FIX Use named arg after expanded args * TST More testing of the refit parameter * Test that in multimetric search refit to single metric, the delegated methods work as expected. * Test that setting probability=False works with multimetric too * Test refit=False gives sensible error * COSMIT multimetric --> multi-metric * REV Correct example doc * COSMIT * REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer * REVIEW refit param: Raise for empty strings * TST Invalid refit params * REVIEW Use <scorer_name> alone; recall --> Recall * REV specify when we expect scorers to not be None * FLAKE8 * REVERT multimetrics in learning_curve and validation_curve * REVIEW Simpler coding style * COSMIT * COSMIT * REV Compress example a bit. Move comment to top * FIX fit_grid_point's previous API must be preserved * Flake8 * TST Use loop; Compare with single-metric * REVIEW Use dict-comprehension instead of helper * REVIEW Remove redundant test * Fix tests incorrect braces * COSMIT * REVIEW Use regexp * REV Simplify aggregation of score dicts * FIX precision and accuracy test * FIX doctest and flake8 * TST the best_* attributes multimetric with single metric * Address @jnothman's review * Address more comments \o/ * DOCFIXES * Fix use the validated fit_param from fit's arguments * Revert alpha to a lower value as before * Using def instead of lambda * Address @jnothman's review batch 1: Fix tests / Doc fixes * Remove superfluous tests * Remove more superfluous testing * TST/FIX loop over refit and check found n_clusters * Cosmetic touches * Use zip instead of manually listing the keys * Fix inverse_transform * FIX bug in fit_grid_point; Allow only single score TST if fit_grid_point works as intended * ENH Use only ROC-AUC and F1-score * Fix typos and flake8; Address Andy's reviews MNT Add a comment on why we do such a transpose + some fixes * ENH Better error messages for incorrect multimetric scoring values +... ENH Avoid exception traceback while using incorrect scoring string * Dict keys must be of string type only * 1. Better error message for invalid scoring 2... Internal functions return single score for single metric scoring * Fix test failures and shuffle tests * Avoid wrapping scorer as dict in learning_curve * Remove doc example as asked for * Some leftover ones * Don't wrap scorer in validation_curve either * Add a doc example and skip it as dict order fails doctest * Import zip from six for python2.7 compat * Make cross_val_score return a cv_results-like dict * Add relevant sections to userguide * Flake8 fixes * Add whatsnew and fix broken links * Use AUC and accuracy instead of f1 * Fix failing doctests cross_validation.rst * DOC add the wrapper example for metrics that return multiple return values * Address andy's comments * Be less weird * Address more of andy's comments * Make a separate cross_validate function to return dict and a cross_val_score * Update the docs to reflect the new cross_validate function * Add cross_validate to toc-tree * Add more tests on type of cross_validate return and time limits * FIX failing doctests * FIX ensure keys are not plural * DOC fix * Address some pending comments * Remove the comment as it is irrelevant now * Remove excess blank line * Fix flake8 inconsistencies * Allow fit_times to be 0 to conform with windows precision * DOC specify how refit param is to be set in multiple metric case * TST ensure cross_validate works for string single metrics + address @jnothman's reviews * Doc fixes * Remove the shape and transform parameter of _aggregate_score_dicts * Address Joel's doc comments * Fix broken doctest * Fix the spurious file * Address Andy's comments * MNT Remove erroneous entry * Address Andy's comments * FIX broken links * Update whats_new.rst missing newline

* Fix Rouseeuw1984 broken link * Change label vbgmm to bgmm Previously modified with PR scikit-learn#6651 * Change tag name Old refers to new tag added with PR scikit-learn#7388 * Remove prefix underscore to match tag * Realign to fit 80 chars * Link to metrics.rst. pairwise metrics yet to be documented * Remove tag as LSHForest is deprecated * Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py. It is deprecated. * Fix few Sphinx warnings * Realign to 80 chars * Changes based on PR review * Remove unused ref in calibration * Fix link ref in covariance.rst * Fix linking issues * Differentiate Rouseeuw1999 tag within file. * Change all duplicate Rouseeuw1999 tags * Remove numbers from tag Rousseeuw

raghavrv mentioned this pull request Sep 13, 2016

[RFC] Changes to model_selection? #5053

Closed

amueller mentioned this pull request Sep 14, 2016

[MRG] Allow the scorer callable to return any object #7424

Closed

raghavrv mentioned this pull request Sep 15, 2016

[RFC?] Make cross_val_score output a dict/named tuple. #7435

Closed

raghavrv changed the title ~~[WIP] ENH Allow cross_val_score to evaluate on multiple metrics~~ [WIP] ENH Allow cross_val_score, GridSearchCV and RandomizedSearchCV to evaluate on multiple metrics Sep 17, 2016

raghavrv force-pushed the multimetric_cross_val_score branch 2 times, most recently from f6d3fe6 to 8b89687 Compare September 26, 2016 17:19

raghavrv force-pushed the multimetric_cross_val_score branch 2 times, most recently from 1149527 to f5a917d Compare September 29, 2016 22:53

raghavrv deleted the multimetric_cross_val_score branch July 11, 2017 15:32

raghavrv mentioned this pull request Jul 11, 2017

[MRG] Multimetric GridSearch - Memoize prediction results (and address some previous comments) #9326

Closed

balakmran pushed a commit to balakmran/scikit-learn that referenced this pull request Jul 20, 2017

Change tag name

d411eaf

Old refers to new tag added with PR scikit-learn#7388

balakmran mentioned this pull request Jul 20, 2017

Fix remaining sphinx errors #9417

Closed

raghavrv mentioned this pull request Jul 24, 2017

[MRG] DOC use def instead of lambda in the multimetric example at model_evaluation.rst #9442

Merged

minggli mentioned this pull request Apr 21, 2018

cross_val_score handling of multiple metrics #11006

Closed

Uh oh!

Conversation

raghavrv commented Sep 11, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODO

Uh oh!

jnothman commented Sep 12, 2016

Uh oh!

raghavrv commented Sep 12, 2016

Uh oh!

jnothman commented Sep 12, 2016

Uh oh!

jnothman commented Sep 12, 2016

Uh oh!

raghavrv commented Sep 12, 2016

Uh oh!

agramfort commented Sep 12, 2016 via email

Uh oh!

raghavrv commented Sep 15, 2016

Uh oh!

raghavrv commented Sep 16, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

raghavrv commented Sep 16, 2016

Uh oh!

amueller commented Sep 16, 2016

Uh oh!

amueller commented Sep 16, 2016

Uh oh!

raghavrv commented Sep 16, 2016

Uh oh!

raghavrv commented Sep 16, 2016

Uh oh!

amueller commented Sep 16, 2016

Uh oh!

raghavrv commented Sep 16, 2016

Uh oh!

amueller commented Sep 16, 2016

Uh oh!

raghavrv commented Sep 17, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Sep 17, 2016

Uh oh!

raghavrv commented Sep 26, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Sep 29, 2016

Uh oh!

raghavrv commented Sep 29, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

raghavrv commented Sep 29, 2016

Uh oh!

MechCoder commented Jul 8, 2017

Uh oh!

raghavrv commented Jul 8, 2017

Uh oh!

raghavrv commented Jul 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amueller commented Jul 10, 2017

Uh oh!

raghavrv commented Jul 10, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

raghavrv commented Sep 11, 2016 •

edited

Loading

raghavrv commented Sep 16, 2016 •

edited

Loading

raghavrv commented Sep 17, 2016 •

edited

Loading

raghavrv commented Sep 26, 2016 •

edited

Loading

raghavrv commented Sep 29, 2016 •

edited

Loading

raghavrv commented Jul 8, 2017 •

edited

Loading