Skip to content

Remove HAGRID from french benchmark#235

Merged
Muennighoff merged 119 commits into
embeddings-benchmark:mainfrom
Lyon-NLP:main
Feb 27, 2024
Merged

Remove HAGRID from french benchmark#235
Muennighoff merged 119 commits into
embeddings-benchmark:mainfrom
Lyon-NLP:main

Conversation

@MathieuCiancone

Copy link
Copy Markdown
Contributor

While working on MTEB for French, we also added the Hagrid task (english, retrieval)
While it's great to have it, it shouldn't be in the run_mteb_french benchmark

gsequeiraOS and others added 30 commits November 7, 2023 09:26
Masakhane dataset and french script for classification
* add Opusparcus dataset

* multilingual usage

* use eval_split of config files

* change eval_split according to data

---------

Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr>
HAL S2S dataset creation and evaluation on clustering task.
* Add DiaBLa dataset for bitext mining

* Add DiaBLa dataset for bitext mining

* deduplicate bitext task

* add Flores

* format files

* add flores to evaluation script

* remove prints

* add revision

---------

Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr>
imenelydiaker and others added 28 commits January 29, 2024 19:19
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
…pusparcuspc

Inherit OpusparcusPC init from MultilingualTask
* put script on HF dataset repos

* remove scripts
* add trust remote code arg

* leave corpus as dict

* remove trust remote code
add bucc and tatoeba bitextmining tasks
* add other language to clustering tasks

* fix main score and S2S task

* update run fr becnhmark script

* Update run_mteb_french.py

* Update AbsTaskClustering.py

* remove train and validation splits
@Muennighoff Muennighoff merged commit d01d053 into embeddings-benchmark:main Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants