Added gini coefficient to ranking and scorer#10084
Added gini coefficient to ranking and scorer#10084tagomatech wants to merge 4 commits intoscikit-learn:masterfrom
Conversation
jnothman
left a comment
There was a problem hiding this comment.
Please add this to metrics/tests/test_common.py and also add specific tests that this matches known scores on toy datasets.
| return np.mean(scores) | ||
|
|
||
|
|
||
| def gini(y_true, y_score): |
There was a problem hiding this comment.
Perhaps name this gini_score for consistency
| ---------- | ||
| .. [1] David J. Hand and Robert J. Till (2001). | ||
| A Simple Generalisation of the Area Under the ROC Curve for | ||
| Multiple Class Classification Problems. In Machine Learning, 45, |
There was a problem hiding this comment.
Your implementation does not currently extend to multiclass. You have merely implemented a chance corrected binary roc
|
@tagomatech Could you please explain why do we need gini coefficient since we already have roc_auc_score? It can almost be replaced by roc_auc_score and it seems hard to find any reference about its definition and application in ML. I don't think the paper your provide is a good reference. It only states that gini index(gini coefficient?) is equivalent to roc_auc_score and the whole paper is based on roc_auc_score. |
|
@qinhanmin2014 |
|
@tagomatech Thanks. |
|
I am -1 to merge since the score can be easily computed from the ROC AUC. |
|
@tagomatech Thanks a lot for your contribution. Sorry but I'm going to close this one with the another -1 above. I think the general consensus is that it can be replaced by roc_auc_score and there's no clear definition. |
|
Actually the Gini coefficient is defined in terms of area under the Lorenz curve (for positive regression models) which is not the same as ROC AUC. I started an undocumented prototype implementation in #15176. |
Added a function at the end of
sklearn\metrics\ranking.pyto compute the Gini coefficient which is being used in some Kaggle competitions.I added the corresponding import declaration in
sklearn\metrics\__init__.pyFinally, I create a
scorerà lasklearnin sklearn\metrics\sorer.py, so that the gini coefficient can be used acrosssklearnvalidation/metrics functions, e.g.cross_val_score.Reference was taken here and results were checked against several entries on Kaggle and sklearn AUC/ROC score (is it not rocket_science, to be honest).