Conversation
|
This should work now robustly in the two class case. still lacking tests and errors if called with multiclass, I think |
|
Maybe we could define a new decorator for score functions: def requires_thresholds(f):
f.requires_thresholds = True
return f
@requires_thresholds
def auc_score(y_true, y_thresholds):
[...]
@requires_thresholds
def averaged_precision_score(y_true, y_thresholds):
[...]And then instead of testing for |
|
@GaelVaroquaux you might want to have a look at this PR :) |
|
@ogrisel that seems like a workable approach. My solution seemed hacky but I didn't see much need to go beyond that for the moment. Yours is nicer of course. |
|
Guys, just looking at this PR. IMHO, this is the kind of problem that I |
|
ROC AUC is such a common evaluation criterion (to compare results with benchmarks in the literature or to compete on challenges for instance) that I think the default scikit-learn API should make it possible to compute those scores for the default estimators without having to subclass them. |
|
+1 There should definitely be a way for the user to do this. |
|
Right, but I think that it should be solved at the estimator level, not Maybe we need a new Mixin, and that mixing should define a new estimator |
|
I think it would still be good to have a declarative description of the expectations of the scoring functions: it would be interesting to declare:
This sorts of declarations could be useful both for computing automated documentation summaries and for building model selection and evaluation utilities (both as part of the scikit-learn project or by third party software). |
|
@GaelVaroquaux Ok it seems I didn't understand you approach before. I'll think about @GaelVaroquaux suggestion and might come up with a PR at some point. |
No, I actually changed my mind based on your feedback :). Sorry, I was
I agree. I'd like to postpone this for later, as it is a challenging G |
|
I wish it was that easy ;) |
Tell us if we can do something about it. If you just want time to focus |
|
I'm mostly busy with my internship as MSRC atm - or rather should be. |
Don't worry: I am ridiculously busy too: I hardly have a day free on my |
|
So wouldn't another idea be to rework the interface between grid search and the score functions such that:
There are some issues with this approach: A. The added complexity of the score function needing to directly interact with a model object. I think we can address (A) by using inheritance or having a few helper functions. The helper_function could be something like: I suppose a similar approach would be to leave score functions are they are now, but create something like a GridScoreFunction() that does what I discussed above and is assembled from the raw score function. Then the input to GridSearch isn't the raw score function, but the GridScoreFunction object. |
|
If i understand your (first) proposal, it basically is the same as the proposal above, only that the score function is now a separate function that gets the model, instead of a method, right? |
|
Yes, I think you're right. I wonder if there's an even cleaner solution that involves create a new type of class to pass around (e.g. Scorer). IMHO, we'd like a general solution that doesn't require us to keep track of all the various combinations of possibilities (single / multi-class, y_hat versus p_hat, models lacking p_hat support). |
|
@kyleabeauchamp I agree on this other hand we don't want the user to have to build nested objects constructs to be able to call basic operations such as computing a score for an estimator. Hence I like the ability of have declarative decorators. Alternatively we could turn the scoring utilities into classes with explicit method names to either take a categorical predictions, thresholds or predictive models as different inputs to different methods depending on the score capabilities. |
|
I liked @GaelVaroquaux's idea and I'll try it out when I get back home in about two weeks. |
|
If I may (after reading this discussion): +1 for @GaelVaroquaux 's proposal. |
|
Just to make sure, @GaelVaroquaux's suggestion (as I understand it) means that the |
|
I reread the discussion above and now I noticed that @GaelVaroquaux proposed to add a new mixin. I'd rather add the |
|
I think it would be easier to make alternative design proposals with WIP pull request without documentation and just a single test to demonstrate concrete usage of the API. |
|
That is definitely good, only some work ;) I'll give it a shot. |
I am afraid that AUC requires a decision_function or predict_proba, and |
|
I thought all classifiers either had |
|
Closed via #1381. |
This enables grid search with AUC score.
What do you think about doing it like this?
cc @ogrisel