WIP Auc grid search by amueller · Pull Request #1014 · scikit-learn/scikit-learn

amueller · 2012-08-11T22:37:18Z

This enables grid search with AUC score.

What do you think about doing it like this?

amueller · 2012-09-01T13:16:15Z

This should work now robustly in the two class case. still lacking tests and errors if called with multiclass, I think

ogrisel · 2012-09-01T13:34:03Z

Maybe we could define a new decorator for score functions:

def requires_thresholds(f):
    f.requires_thresholds = True
    return f

@requires_thresholds
def auc_score(y_true, y_thresholds):
    [...]

@requires_thresholds
def averaged_precision_score(y_true, y_thresholds):
    [...]

And then instead of testing for auc_score and average_precision_score explicitly we could check for getattr(clf, 'requires_thresholds', False). That would make it possible to grid search for custom score implementations that require thresholds instead of categorical predictions.

ogrisel · 2012-09-01T13:34:48Z

@GaelVaroquaux you might want to have a look at this PR :)

amueller · 2012-09-01T13:54:01Z

@ogrisel that seems like a workable approach. My solution seemed hacky but I didn't see much need to go beyond that for the moment. Yours is nicer of course.

GaelVaroquaux · 2012-09-04T06:52:55Z

Guys, just looking at this PR. IMHO, this is the kind of problem that I
would tackle by subclassing the estimator to have it's 'score' method
compute an AUC. Keeping this approach in mind is just so much more
versatile and avoids having to do all kinds of hacks in the codebase.

ogrisel · 2012-09-04T07:26:58Z

ROC AUC is such a common evaluation criterion (to compare results with benchmarks in the literature or to compete on challenges for instance) that I think the default scikit-learn API should make it possible to compute those scores for the default estimators without having to subclass them.

amueller · 2012-09-04T08:02:30Z

+1 There should definitely be a way for the user to do this.

GaelVaroquaux · 2012-09-04T13:12:03Z

Right, but I think that it should be solved at the estimator level, not
at the 'grid_search' level.

Maybe we need a new Mixin, and that mixing should define a new estimator
parameter 'score_type' that would switch the 'score' method from the
zero-one loss to other scores?

ogrisel · 2012-09-04T13:17:36Z

I think it would still be good to have a declarative description of the expectations of the scoring functions: it would be interesting to declare:

can be used for binary classification
can be used for multiclass classification
can be used for multilabel classification
can be used for single target regression
can be used for multiple targets regression
accepts prediction thresholds
accepts categorical predictions
... maybe more?

This sorts of declarations could be useful both for computing automated documentation summaries and for building model selection and evaluation utilities (both as part of the scikit-learn project or by third party software).

amueller · 2012-09-04T13:35:31Z

@GaelVaroquaux Ok it seems I didn't understand you approach before.
Now I understand and agree :)
I also agree with Olivier that we should have a better, self-contained description on what our objects / functions are capable. I feel that is a mostly unrelated issue, though.

I'll think about @GaelVaroquaux suggestion and might come up with a PR at some point.

GaelVaroquaux · 2012-09-04T15:00:57Z

@GaelVaroquaux Ok it seems I didn't understand you approach before.
Now I understand and agree :)

No, I actually changed my mind based on your feedback :). Sorry, I was
writing emails on my mobile phone, and thus was probably too terse.

I also agree with Olivier that we should have a better, self-contained
description on what our objects / functions are capable. I feel that is a
mostly unrelated issue, though.

I agree. I'd like to postpone this for later, as it is a challenging
design issue (if you come around in Paris, we could hack on it together
:P ).

G

amueller · 2012-09-04T15:12:48Z

I wish it was that easy ;)
Well once I won the kaggle competition I will be rich and have unlimited time....
Only it is not looking so good at the moment :-/

GaelVaroquaux · 2012-09-04T15:26:00Z

Well once I won the kaggle competition I will be rich and have unlimited
time....
Only it is not looking so good at the moment :-/

Tell us if we can do something about it. If you just want time to focus
on other things, I must say that it is perfectly understandable, and I
have the same problem.

amueller · 2012-09-04T15:28:48Z

I'm mostly busy with my internship as MSRC atm - or rather should be.
Once I'm back, I'm sure I'll find the time to come to Paris.

GaelVaroquaux · 2012-09-04T15:39:10Z

I'm mostly busy with my internship as MSRC atm - or rather should be.
Once I'm back, I'm sure I'll find the time to come to Paris.

Don't worry: I am ridiculously busy too: I hardly have a day free on my
agenda till mid October :(

kyleabeauchamp · 2012-09-11T18:11:16Z

So wouldn't another idea be to rework the interface between grid search and the score functions such that:

Score functions accept y_true, x, and a model as input, rather than y_true and y_hat
The score function manipulates the model object to calculate whatever it needs (e.g. y_hat or p_hat)

There are some issues with this approach:

A. The added complexity of the score function needing to directly interact with a model object.

I think we can address (A) by using inheritance or having a few helper functions. The helper_function could be something like:

def score_by_probabilities(y_true, x, model,score_function):
    #... We would add lots of error checking here to make sure that the score-function is compatible with the model
    p_hat = model.predict_proba(x)
    return score_function(y_true, phat)

I suppose a similar approach would be to leave score functions are they are now, but create something like a GridScoreFunction() that does what I discussed above and is assembled from the raw score function. Then the input to GridSearch isn't the raw score function, but the GridScoreFunction object.

amueller · 2012-09-11T20:06:42Z

If i understand your (first) proposal, it basically is the same as the proposal above, only that the score function is now a separate function that gets the model, instead of a method, right?

kyleabeauchamp · 2012-09-11T20:13:29Z

Yes, I think you're right.

I wonder if there's an even cleaner solution that involves create a new type of class to pass around (e.g. Scorer). IMHO, we'd like a general solution that doesn't require us to keep track of all the various combinations of possibilities (single / multi-class, y_hat versus p_hat, models lacking p_hat support).

ogrisel · 2012-09-11T21:53:31Z

@kyleabeauchamp I agree on this other hand we don't want the user to have to build nested objects constructs to be able to call basic operations such as computing a score for an estimator. Hence I like the ability of have declarative decorators.

Alternatively we could turn the scoring utilities into classes with explicit method names to either take a categorical predictions, thresholds or predictive models as different inputs to different methods depending on the score capabilities.

amueller · 2012-09-12T08:50:14Z

I liked @GaelVaroquaux's idea and I'll try it out when I get back home in about two weeks.

VirgileFritsch · 2012-09-13T14:29:54Z

If I may (after reading this discussion): +1 for @GaelVaroquaux 's proposal.

amueller · 2012-09-17T21:34:20Z

Just to make sure, @GaelVaroquaux's suggestion (as I understand it) means that the __init__ of all classifiers must call the __init__ of the ClassifierMixin.
I'm for that, just want everybody to be on the same page.

amueller · 2012-09-17T23:07:40Z

I reread the discussion above and now I noticed that @GaelVaroquaux proposed to add a new mixin. I'd rather add the score to the ClassifierMixin as I said in my last post.

ogrisel · 2012-09-18T08:15:11Z

I think it would be easier to make alternative design proposals with WIP pull request without documentation and just a single test to demonstrate concrete usage of the API.

amueller · 2012-09-18T08:54:01Z

That is definitely good, only some work ;) I'll give it a shot.
I'm a bit concerned about multiple inheritance atm, though... but maybe I'm getting ahead of myself again...

GaelVaroquaux · 2012-09-19T16:44:49Z

I reread the discussion above and now I noticed that [1]@GaelVaroquaux
proposed to add a new mixin. I'd rather add the score to the
ClassifierMixin as I said in my last post.

I am afraid that AUC requires a decision_function or predict_proba, and
thus that not every classifier will be able to offer AUC. Am I wrong?

amueller · 2012-09-20T08:55:29Z

I thought all classifiers either had predict_proba or decision_function but that might not be true.

amueller · 2013-02-03T16:16:39Z

Closed via #1381.

amueller mentioned this pull request Aug 12, 2012

Pipeline can not be used with multiclass (or the other way around) #1019

Closed

amueller added 3 commits September 1, 2012 13:57

"ENH" Hack AUC and AP into GridSearch

240265b

ENH added auc_score grid search for estimators with predict_proba

00df098

ENH simplified, should work in all cases now.

ce80758

amueller mentioned this pull request Sep 11, 2012

added multiclass_log_loss metric #1125

Closed

amueller mentioned this pull request Sep 25, 2012

GridSearchCV: allow for passing extra vectors to score/loss function? #1179

Closed

amueller mentioned this pull request Oct 2, 2012

WIP GridSearch with advanced score functions. #1198

Closed

amueller mentioned this pull request Nov 18, 2012

MRG adding SomeScore objects for better (?!) grid search interface. #1381

Merged

amueller closed this Feb 3, 2013

amueller deleted the auc_grid_search branch February 3, 2013 16:16

amueller restored the auc_grid_search branch February 3, 2013 16:17

amueller mentioned this pull request Mar 14, 2013

ENH annotate metrics to simplify populating SCORERS #1774

Closed

amueller deleted the auc_grid_search branch May 19, 2017 20:16

Uh oh!

Conversation

amueller commented Aug 11, 2012

Uh oh!

amueller commented Sep 1, 2012

Uh oh!

ogrisel commented Sep 1, 2012

Uh oh!

ogrisel commented Sep 1, 2012

Uh oh!

amueller commented Sep 1, 2012

Uh oh!

GaelVaroquaux commented Sep 4, 2012

Uh oh!

ogrisel commented Sep 4, 2012

Uh oh!

amueller commented Sep 4, 2012

Uh oh!

GaelVaroquaux commented Sep 4, 2012

Uh oh!

ogrisel commented Sep 4, 2012

Uh oh!

amueller commented Sep 4, 2012

Uh oh!

GaelVaroquaux commented Sep 4, 2012

Uh oh!

amueller commented Sep 4, 2012

Uh oh!

GaelVaroquaux commented Sep 4, 2012

Uh oh!

amueller commented Sep 4, 2012

Uh oh!

GaelVaroquaux commented Sep 4, 2012

Uh oh!

kyleabeauchamp commented Sep 11, 2012

Uh oh!

amueller commented Sep 11, 2012

Uh oh!

kyleabeauchamp commented Sep 11, 2012

Uh oh!

ogrisel commented Sep 11, 2012

Uh oh!

amueller commented Sep 12, 2012

Uh oh!

VirgileFritsch commented Sep 13, 2012

Uh oh!

amueller commented Sep 17, 2012

Uh oh!

amueller commented Sep 17, 2012

Uh oh!

ogrisel commented Sep 18, 2012

Uh oh!

amueller commented Sep 18, 2012

Uh oh!

GaelVaroquaux commented Sep 19, 2012

Uh oh!

amueller commented Sep 20, 2012

Uh oh!

amueller commented Feb 3, 2013

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants