[MRG] Add DCG and NDCG#7739
[MRG] Add DCG and NDCG#7739agramfort merged 27 commits intoscikit-learn:masterfrom davidgasquez:dcg_metrics
Conversation
sklearn/metrics/ranking.py
Outdated
| return np.sum(gain / discounts) | ||
|
|
||
|
|
||
| def ndcg_score(ground_truth, predictions, k=5): |
There was a problem hiding this comment.
you probably should use y_true and y_pred, or did I miss something?
sklearn/metrics/ranking.py
Outdated
| """ | ||
| lb = LabelBinarizer() | ||
| lb.fit(range(len(predictions) + 1)) | ||
| T = lb.transform(ground_truth) |
There was a problem hiding this comment.
We generally avoid single-letter variable name
|
| >>> ground_truth = [1, 0, 2] | ||
| >>> predictions = [[0.15, 0.55, 0.2], [0.7, 0.2, 0.1], [0.06, 0.04, 0.9]] | ||
| >>> score = ndcg_score(ground_truth, predictions, k=2) | ||
| 1.0 |
There was a problem hiding this comment.
If you store the result in score, there is no printing, which is why this doctest is failing
…into dcg_metrics
…into dcg_metrics
|
Hey there @TomDLT! Thanks a lot for this useful tips for someone like me! I've tried to fix and improve the implementation a bit. I still have to:
Again, I'd appreciate a lot if you want to let me know anything that I could improve. Thanks again! 😄 |
…into dcg_metrics
…into dcg_metrics
…into dcg_metrics
…into dcg_metrics
…into dcg_metrics
|
Hey there @TomDLT! Sorry for the delay with this one. Would love to hear how it feels now that it has his own test and they are complete. 😄 Also, I'd love to get some input on which might be the next steps for me. Thanks for your help! |
TomDLT
left a comment
There was a problem hiding this comment.
Thanks for the update, it looks much better!
You have to fix your unit test, it's failing.
Can you also add a small text on the documentation (doc/modules/model_evaluation.rst, in the ranking section), with the maths formula and explaining when these metrics should be used.
sklearn/metrics/ranking.py
Outdated
| ---------- | ||
| .. [1] `Wikipedia entry for the Discounted Cumulative Gain | ||
| <https://en.wikipedia.org/wiki/Discounted_cumulative_gain>`_ | ||
| .. [2] `Gist about Ranking Metrics` |
There was a problem hiding this comment.
I am not sure we want to keep this reference.
| y_score, y_true = check_X_y(y_score, y_true) | ||
|
|
||
| lb = LabelBinarizer() | ||
| lb.fit(range(max(max(y_true) + 1, len(y_true)))) |
There was a problem hiding this comment.
why do you need this?
can you add a comment?
There was a problem hiding this comment.
This is to avoid skipping some test labels when the input is [0, 1, 3] in the test set but in the training step we were using [0, 1, 2, 3]
|
Just pushed another series of small commits @TomDLT!
Not sure if I'm the best one to do that as I've only used it once and briefly! |
…into dcg_metrics
|
Any updates on this? |
|
If your pull-request is ready to be merged, change the prefix from |
…into dcg_metrics
…into dcg_metrics
…into dcg_metrics
…into dcg_metrics
|
thanks @davidgasquez I'll send a PR now to fix cosmits + update what's new |
* Add DCG and NDCG ranking functions
* Add DCG and NDCG ranking functions
* Add DCG and NDCG ranking functions
* Add DCG and NDCG ranking functions
* Add DCG and NDCG ranking functions
* Add DCG and NDCG ranking functions
|
This appears to have been merged without review..? Issues at #9921 (comment), #9930, #9929, #9931 |
|
I think we should consider reverting this for 0.19.1. It implements a ?non-standard definition of NDCG without sufficient documentation, and does so in a way that makes it unusable. |
* Add DCG and NDCG ranking functions
* Add DCG and NDCG ranking functions
Reference Issue
DCG and NDCG implementation. Issue #2805
What does this implement/fix? Explain your changes.
Add two scores,
dcg_scoreandndcg_scoreto compute Discounted Cumulative Gain (DCG) or the Normalized one (NDCG) at rank K.Any other comments?
As a first time collaborator, let me know if I should add or change something!