feat: Forbid task metadata and add upload functions#1362
Conversation
|
@Samoed is this PR stale? |
|
Yes, some datasets (mostly from CMTEB) load data from multiple repositories on HF, so we need to convert them first to complete this PR. |
# Conflicts: # mteb/abstasks/TaskMetadata.py
# Conflicts: # mteb/tasks/Reranking/zho/CMTEBReranking.py # tests/test_TaskMetadata.py
|
I’m traveling and won’t be at a computer til the end of the week, but this looks good. Are there any datasets that are still not converted? And is the mFollowIR Russian still an issue? FWIW the v2 branch fixed a small bug that the current one doesn’t have, so the numbers from main and v2 will be different. The number looks reasonable and I wouldn’t worry about it. |
|
I've tested from |
|
@KennethEnevoldsen Can you review, please? |
KennethEnevoldsen
left a comment
There was a problem hiding this comment.
A few minor things and a suggestion to move the upload utility to the class object (assuming we want to maintain it)
Generally though this looks great!
…al.py Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
* fix FilipinoHateSpeechClassification * update tests
Checklist
make test.make lint.Some retrieval tasks need to be reuploaded because they are loaded from different repositories. I’ve created upload functions to convert these datasets into our current format. I tested each reuploaded dataset, and the scores matched, except for
mFollowIR(rus). In the main branch, themain_scoreis-0.039465099069488106, whereas the reuploaded dataset gives-0.031187925634321677. However, this run was only for testing purposes.Initially, I tried adding this script to the
mtebfolder, but it gave an error:AttributeError: module 'logging' has no attribute 'getLogger'. So, I moved it to thescriptsfolder.Additionally, some tasks may not be imported successfully. For example, I tried to load
IndicXnliPairClassification, but it resulted in an error.uploaded.zip
mteb main.zip