Remove HAGRID from french benchmark#235
Merged
Merged
Conversation
Masakhane dataset and french script for classification
* add Opusparcus dataset * multilingual usage * use eval_split of config files * change eval_split according to data --------- Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr>
HAL S2S dataset creation and evaluation on clustering task.
* Add DiaBLa dataset for bitext mining * Add DiaBLa dataset for bitext mining * deduplicate bitext task * add Flores * format files * add flores to evaluation script * remove prints * add revision --------- Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Fix conflicts with base repo
Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
…pusparcuspc Inherit OpusparcusPC init from MultilingualTask
remove train split from evaluation
* put script on HF dataset repos * remove scripts
* add trust remote code arg * leave corpus as dict * remove trust remote code
add bucc and tatoeba bitextmining tasks
* add other language to clustering tasks * fix main score and S2S task * update run fr becnhmark script * Update run_mteb_french.py * Update AbsTaskClustering.py * remove train and validation splits
Muennighoff
approved these changes
Feb 27, 2024
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
While working on MTEB for French, we also added the Hagrid task (english, retrieval)
While it's great to have it, it shouldn't be in the
run_mteb_frenchbenchmark