dataset: Add TREC DL #3379
Conversation
9acec73 to
273e09f
Compare
|
Task looks good! Can you target |
|
Thanks for your feedback! I'll update the pull request to target the v2 branch, compute the task statistics, and rename the file using snake case as requested. |
e6458be to
b39d078
Compare
|
I have incorporated all of your feedback. You can check the statistics at the following links: |
|
Yes, that good, but it's in |
|
Do you mean I should add the statistics to https://github.com/embeddings-benchmark/mteb/blob/main/docs/tasks.md? |
No, it should appear in |
|
I was able to contribute more easily thanks to your kind explanation 🙂 |
|
Amazing, thanks so much @whybe-choi and @Samoed for the feedback! Before we merge though, I think there may be too many queries? I think TREC DL 19 and 20 have under 100 queries but I see quite a few more here. Let me get the stats and update this. |
|
I think it's because all queries from the train, dev, and test subsets are combined. |
|
Yes, my apologies @whybe-choi, this turned out to be more complicated than I thought when I linked the data. That website is where TREC links to, but then they for some reason didn't include the judged queries. From the paper:
The official test sets of DL19 and DL20 are much much smaller (e.g. 43 for DL19 and 45 for DL20), but that's because they took a subset of the test queries for annotations. Weirdly, I cannot find these on any TREC website but maybe I am being dense. The easiest way I see to get those is to install the python package Again, I am so sorry about misdirecting you and thanks for your already excellent work here!
This is also part of it, although not the main reason -- but yes only include the test ones! |
|
That's fine. Thank you for the kind guidance. I will get back to work and ask for a review again! |
|
@orionw Whenever you have time, could you please review it again? |
|
Queries look perfect! The qrels seem still quite large though (9k qrels for ~50 queries?). If that contains all qrels you could probably filter by query-ids in the queries. |
|
I checked the dataset, and all qrels were indeed for the test query IDs, but original dataset included items where the score is 0. Would it be okay to simply delete those? {"query-id": "23849", "corpus-id": "1020327", "score": 2}
{"query-id": "23849", "corpus-id": "1034183", "score": 3}
{"query-id": "23849", "corpus-id": "1120730", "score": 0}
{"query-id": "23849", "corpus-id": "1139571", "score": 1}
{"query-id": "23849", "corpus-id": "1143724", "score": 0}
{"query-id": "23849", "corpus-id": "1147202", "score": 0}
{"query-id": "23849", "corpus-id": "1150311", "score": 0}
{"query-id": "23849", "corpus-id": "1158886", "score": 2}
... |
|
Ah oops, forgot how deeply judged these datasets are. That number looks right according to ir_datasets. Definitely don't delete them! LGTM! |
a88e955 to
f0303d9
Compare
|
@whybe-choi Сan this pr be merged? |
|
@Samoed I think it is enough to merge. But, I uploaded the dataset to my Hugging Face repository— is that okay? |
|
Yes, that is okay, but you need to update revision of repository if you updated qrels |
|
This is already the revision where the correct qrels are reflected. As far as I know, there shouldn't be any problem ! |
|
Hi, @whybe-choi @orionw . Thanks for your implementation of TREC DL 2019, 2020 datasets. I have on question though. On the ir_datasets, it looks like there are 3.2M docs for the TREC DL 2019, but I see 8.8M on @whybe-choi 's dataset. Do you have some ideas? I don't know what is right. |
|
Hello, @yjoonjang !
|
|
Ahh okay. Thank you for your help !! |


Close #3348
This pull request adds support for two new retrieval tasks, TRECDL2019 and TRECDL2020, to the English retrieval task suite. It introduces their implementations, descriptive statistics, and integrates them into the task registry, expanding the benchmark coverage for TREC Deep Learning tracks.
New Retrieval Task Support
TRECDL2019andTRECDL2020retrieval tasks intrecdl_retrieval.py, including metadata, dataset references, and evaluation details.TRECDL2019.jsonandTRECDL2020.json, providing sample counts and text/query statistics. [1] [2]Task Registry Integration
TRECDL2019andTRECDL2020in the English retrieval task module (__init__.py), making them available for evaluation and selection. [1] [2]