Skip to content

[MRG] Add DCG and NDCG#7739

Merged
agramfort merged 27 commits intoscikit-learn:masterfrom
davidgasquez:dcg_metrics
Jun 8, 2017
Merged

[MRG] Add DCG and NDCG#7739
agramfort merged 27 commits intoscikit-learn:masterfrom
davidgasquez:dcg_metrics

Conversation

@davidgasquez
Copy link
Copy Markdown
Contributor

Reference Issue

DCG and NDCG implementation. Issue #2805

What does this implement/fix? Explain your changes.

Add two scores, dcg_score and ndcg_score to compute Discounted Cumulative Gain (DCG) or the Normalized one (NDCG) at rank K.

Any other comments?

As a first time collaborator, let me know if I should add or change something!

return np.sum(gain / discounts)


def ndcg_score(ground_truth, predictions, k=5):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably should use y_true and y_pred, or did I miss something?

"""
lb = LabelBinarizer()
lb.fit(range(len(predictions) + 1))
T = lb.transform(ground_truth)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally avoid single-letter variable name

@TomDLT
Copy link
Copy Markdown
Member

TomDLT commented Oct 24, 2016

  • Could you add a reference link for these metrics?
  • Also, we will need some unit testing, that you can add in sklearn/metrics/tests/test_ranking.py.
  • And you can add the metrics in scikit-learn/sklearn/metrics/__init__.py
  • Ideally, you could write a small text on the documentation, with the maths formula and explaining when these metrics should be used.

>>> ground_truth = [1, 0, 2]
>>> predictions = [[0.15, 0.55, 0.2], [0.7, 0.2, 0.1], [0.06, 0.04, 0.9]]
>>> score = ndcg_score(ground_truth, predictions, k=2)
1.0
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you store the result in score, there is no printing, which is why this doctest is failing

@davidgasquez
Copy link
Copy Markdown
Contributor Author

davidgasquez commented Oct 27, 2016

Hey there @TomDLT! Thanks a lot for this useful tips for someone like me! I've tried to fix and improve the implementation a bit. I still have to:

  • Add reference links to the metrics
  • Include unit testing
  • Add the metrics in scikit-learn/sklearn/metrics/__init__.py

Again, I'd appreciate a lot if you want to let me know anything that I could improve. Thanks again! 😄

@davidgasquez
Copy link
Copy Markdown
Contributor Author

Hey there @TomDLT! Sorry for the delay with this one. Would love to hear how it feels now that it has his own test and they are complete. 😄

Also, I'd love to get some input on which might be the next steps for me. Thanks for your help!

Copy link
Copy Markdown
Member

@TomDLT TomDLT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, it looks much better!
You have to fix your unit test, it's failing.

Can you also add a small text on the documentation (doc/modules/model_evaluation.rst, in the ranking section), with the maths formula and explaining when these metrics should be used.

----------
.. [1] `Wikipedia entry for the Discounted Cumulative Gain
<https://en.wikipedia.org/wiki/Discounted_cumulative_gain>`_
.. [2] `Gist about Ranking Metrics`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we want to keep this reference.

y_score, y_true = check_X_y(y_score, y_true)

lb = LabelBinarizer()
lb.fit(range(max(max(y_true) + 1, len(y_true))))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need this?
can you add a comment?

Copy link
Copy Markdown
Contributor Author

@davidgasquez davidgasquez Jan 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to avoid skipping some test labels when the input is [0, 1, 3] in the test set but in the training step we were using [0, 1, 2, 3]

@davidgasquez
Copy link
Copy Markdown
Contributor Author

Just pushed another series of small commits @TomDLT!

Can you also add a small text on the documentation (doc/modules/model_evaluation.rst, in the ranking section), with the maths formula and explaining when these metrics should be used.

Not sure if I'm the best one to do that as I've only used it once and briefly!

@davidgasquez
Copy link
Copy Markdown
Contributor Author

Any updates on this?

@TomDLT
Copy link
Copy Markdown
Member

TomDLT commented Mar 24, 2017

If your pull-request is ready to be merged, change the prefix from [WIP] to [MRG], and wait for some reviews. I'll try to take a look shortly.

@davidgasquez davidgasquez changed the title [WIP] Add DCG and NDCG [MRG] Add DCG and NDCG Mar 24, 2017
@agramfort agramfort merged commit 419f21b into scikit-learn:master Jun 8, 2017
@agramfort
Copy link
Copy Markdown
Member

thanks @davidgasquez

I'll send a PR now to fix cosmits + update what's new

Sundrique pushed a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
* Add DCG and NDCG ranking functions
dmohns pushed a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
* Add DCG and NDCG ranking functions
dmohns pushed a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
* Add DCG and NDCG ranking functions
NelleV pushed a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
* Add DCG and NDCG ranking functions
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
* Add DCG and NDCG ranking functions
AishwaryaRK pushed a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017
* Add DCG and NDCG ranking functions
@jnothman
Copy link
Copy Markdown
Member

jnothman commented Oct 16, 2017

This appears to have been merged without review..? Issues at #9921 (comment), #9930, #9929, #9931

@jnothman
Copy link
Copy Markdown
Member

I think we should consider reverting this for 0.19.1. It implements a ?non-standard definition of NDCG without sufficient documentation, and does so in a way that makes it unusable.

maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
* Add DCG and NDCG ranking functions
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
* Add DCG and NDCG ranking functions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants