[MRG+2] Require explicit average arg for multiclass/label P/R/F metrics and scorers#2679
Conversation
|
I think it is a bit weird that the |
|
Can you briefly explain why this change is necessary after #2610 is merged? |
|
Thanks for looking at this, Andy. Responses:
|
|
@arjoly, I'd like it if you could review or comment on this at some point. |
|
Do you plan to move the averaging keyword to the third argument? Do you want to remove the default value or set the default value to None? |
|
It is essential that we stop the default being 'weighted', and anything I don't mind changing average to None by default... but I don't think that On Fri, Jan 3, 2014 at 1:00 AM, Arnaud Joly notifications@github.comwrote:
|
|
looks good to merge ! |
|
Thanks for the review, @arjoly |
There was a problem hiding this comment.
I think that the docs (the part that describes the different scoring options http://scikit-learn.org/dev/modules/model_evaluation.html#common-cases-predefined-values ) should be updated to stress this.
There was a problem hiding this comment.
I think I should just merge this PR with #2676 which focuses on that
specifically. It doesn't entirely make sense without it.
On 19 January 2014 02:22, Gael Varoquaux notifications@github.com wrote:
In benchmarks/bench_multilabel_metrics.py:
@@ -20,7 +20,7 @@
METRICS = {
- 'f1': f1_score,
- 'f1': partial(f1_score, average='micro'),
I think that the docs (the part that describes the different scoring
options
http://scikit-learn.org/dev/modules/model_evaluation.html#common-cases-predefined-values) should be updated to stress this.—
Reply to this email directly or view it on GitHubhttps://github.com//pull/2679/files#r8988047
.
There was a problem hiding this comment.
Note to self (and other reviewer) this merge has been done.
|
I've rebased this on #2676, so that both the metrics and scorers are explicit. |
|
And that rebase means @arjoly's LGTM no longer applies. If you'd like to review the whole PR, Arnaud, that would be nice ;) |
|
Is there a need to make There are also several new constants such as It would be nice to add an |
sklearn/metrics/scorer.py
Outdated
Yes, IMO. If someone wrote their own CV utility, they should be using I should note that
I agree that there's a lot of unnecessary mess in the module namespace, but I think that's out of this PR's scope. |
|
Rebased and addressed @arjoly's comments. |
doc/whats_new.rst
Outdated
|
LGTM |
|
Thanks for the feedback! I hope overall that it's the right thing to be On 21 January 2014 21:05, Arnaud Joly notifications@github.com wrote:
|
2672e68 to
c628ac8
Compare
|
Rebased again. I'd really like to see this merged, finally. @mblondel, you expressed distaste in the default |
Thanks you! |
sklearn/metrics/classification.py
Outdated
There was a problem hiding this comment.
I find that it is more elegant to simply use a double tick (") when the string is defined with simple ticks, and vice versa, rather than protecting the ticks.
Not that it really matters, so don't change anything.
|
Looks good to me! 👍 for merge. Thanks a lot. |
|
I can have a look at this tomorrow., (if it hasn't been merged by then already) |
|
Thanks Gaël! Now get back to your research proposals! ;) On 8 December 2014 at 04:21, Manoj Kumar notifications@github.com wrote:
|
doc/modules/model_evaluation.rst
Outdated
There was a problem hiding this comment.
Why does it have to calculate twice? ;)
|
@jnothman I'm still at a point where I learn more from Pull Requests than Pull Requests learn from me, so sorry if my comments caused more pain (in explaining). than pleasure. ;) |
c628ac8 to
83d9e4f
Compare
|
There was no problem with the comments. Thanks for catching some small On 9 December 2014 at 01:36, Manoj Kumar notifications@github.com wrote:
|
83d9e4f to
f0d2881
Compare
|
I'll merge after travis confirms that I haven't done anything silly. |
…ault Scorers for different average parameters have been added.
f0d2881 to
081a554
Compare
[MRG] Require explicit average arg for multiclass/label P/R/F metrics and scorers
|
Thanks @jnothman ! |
|
Sorry I am at a conference and didn't have time to review. Glad that this is finally in. Regarding 'weighted', my main concern was just that we should really try to avoid using non-standard stuff as default. On the other hand, there is the question of backward compatibility and it's hard to tell which would be a better default between micro and macro averaging. When the default is used (weighted), we could potentially raise a warning and tell the user to explicitly specify the averaging option. This way, users will be able to correctly report the results they got when writing a paper. |
In order to avoid problems like #2094, and to avoid people unwittingly reporting weighted average, this goes towards making 'average' a required parameter for multiclass/multilabel precision, recall, f-score. Closely related to #2676.
After a deprecation cycle, we can turn the warning into an error, or make macro/micro default.
This PR also shards the builtin scorers to make the averaging explicit. This avoids users getting binary behaviour when they shouldn't (cf. #2094 where
scoringisn't used). I think this is extra important because "weighted" F1 isn't especially common in the literature, and having people report it without realising that's what it is is unhelpful to the applied ML community. This helps, IMO, towards a more explicit and robust API for binary classification metrics (cf. #2610).It also entails a deprecation procedure for scorers, and more API there: public
get_scorerandlist_scorers