[WIP] FIX ndcg to work for arbitrarily many samples#9928
[WIP] FIX ndcg to work for arbitrarily many samples#9928jnothman wants to merge 2 commits intoscikit-learn:masterfrom
Conversation
|
This is intended to be a quick fix for 0.19.1. I am creating other issues to address shortfalls in ndcg API and testing. |
|
Sorry to disturb if I'm not qualified to post my opinion here. y_true = [0, 1, 0, 1]
y_score = [[0.15, 0.85], [0.7, 0.3], [0.06, 0.94], [0.7, 0.3]]
metrics.ndcg_score(y_true, y_score)There seems two problems and the second is not solved here: Also, if current implementation is right, could we provide a reference for users (and for me :) )? I can't find any reference which is consistent with current implementation and the reference in the doc is a dead link. Personally, I might still don't think current implementation is right. It's a simple copy-paste from kaggle and I think ogrisel's implementation here is at least what I would use. |
|
Yes, of course I forgot about the binary case quirk. Thanks.
I'm not certain about what definitions are standard. I don't really think
we should be implementing true learning to rank metrics in scikit-learn. It
is not a task our estimators solve. But we can use multilabel and
multiclass evaluations based on ndcg. The current implementation handles
the multiclass case. It should be easy to extend to the multilabel case.
But atm I'm just trying to put out fires. The alternative way to do so is
to retract the implementation: after all, clearly no one has used it.
|
Fixes #9921
TODO: add test for binary case