Skip to content

Returning Evaluation results#26

Merged
NouamaneTazi merged 5 commits into
embeddings-benchmark:mainfrom
AmrMKayid:return-results
Aug 5, 2022
Merged

Returning Evaluation results#26
NouamaneTazi merged 5 commits into
embeddings-benchmark:mainfrom
AmrMKayid:return-results

Conversation

@AmrMKayid

@AmrMKayid AmrMKayid commented Jul 20, 2022

Copy link
Copy Markdown
Contributor

Return the evaluation results dictionary if return_resutls is set to True

Comment thread mteb/evaluation/MTEB.py Outdated
Comment thread mteb/evaluation/MTEB.py Outdated
AmrMKayid and others added 3 commits July 20, 2022 14:02
Co-authored-by: holidaydrien <adrien.morisot@gmail.com>
Co-authored-by: holidaydrien <adrien.morisot@gmail.com>
@NouamaneTazi

Copy link
Copy Markdown
Member

Apologies for the delay. Thank you for the contribution! I think many users would find this handy, would you like to add a small example of how to use the flag in the README?

@Muennighoff

Muennighoff commented Aug 3, 2022

Copy link
Copy Markdown
Contributor

It doesn't hurt always returning them, so I would remove the kwarg & always return them; What do you think @NouamaneTazi ?

@NouamaneTazi

Copy link
Copy Markdown
Member

True, I don't see a problem with that neither. As long as we update the related docs.

@Muennighoff

Copy link
Copy Markdown
Contributor

@AmrMKayid If you could make it the default & update the docstring of run that would be amazing!

@AmrMKayid

Copy link
Copy Markdown
Contributor Author

@NouamaneTazi @Muennighoff Thank you very much for the feedback! I have made it the default and updated the docs :))

@NouamaneTazi

Copy link
Copy Markdown
Member

Amazing! Thanks for the clean PR :)

@NouamaneTazi NouamaneTazi merged commit 8f3242c into embeddings-benchmark:main Aug 5, 2022
Muennighoff added a commit that referenced this pull request Feb 22, 2024
* add Masakhane dataset config

* add trigram lang code for dataset who use it

* create french script eval

* fix French word

* add some documentation

* add script to process and upload alloprof on HF

* build script for HF

* adding dataset processing for mteb

* add script to process and upload alloprof on HF

* build script for HF

* adding dataset processing for mteb

* refactor few thing

* remove whitespaces

* 4 pair classification (#10)

* add Opusparcus dataset

* multilingual usage

* use eval_split of config files

* change eval_split according to data

---------

Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr>

* add script to process and upload alloprof on HF

* build script for HF

* adding dataset processing for mteb

* refactor few thing

* remove whitespaces

* Clustering with HAL S2S dataset (#11)

HAL S2S dataset creation and evaluation on clustering task.

* adding BSARD dataset

* add BSARD to benchmark

* adding Hagrid dataset

* DiaBLa and Flores Bitext Mining evaluation (#12)

* Add DiaBLa dataset for bitext mining

* Add DiaBLa dataset for bitext mining

* deduplicate bitext task

* add Flores

* format files

* add flores to evaluation script

* remove prints

* add revision

---------

Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr>

* add script to process and upload alloprof on HF

* build script for HF

* adding dataset processing for mteb

* refactor few thing

* remove whitespaces

* adding dataset processing for mteb

* adding BSARD dataset

* add BSARD to benchmark

* adding Hagrid dataset

* fix change on langmapping

* reset alphabetical order

* add revision handling

* Clustering: Add AlloProf dataset  (#17)

AlloProf dataset for clustering task

* handling of revision

* change split + add revision handling

* add script to process and upload alloprof on HF

* build script for HF

* adding dataset processing for mteb

* refactor few thing

* remove whitespaces

* adding dataset processing for mteb

* adding BSARD dataset

* add BSARD to benchmark

* adding Hagrid dataset

* add script to process and upload alloprof on HF

* adding dataset processing for mteb

* refactor few thing

* reset alphabetical order

* add revision handling

* handling of revision

* change split + add revision handling

* use eval variable

* alphabetic order

* Add MLSUM dataset for clustering task (#21)

* Use Masakhane dataset for clustering task (#23)

* 16 add datasets to readmemd (#18)

* run task table

* run task table

* Add MLSUM dataset for clustering task (#21)

* Use Masakhane dataset for clustering task (#23)

* run task table

* refresh readme

* refresh readme

* run task table

* refresh readme

---------

Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr>
Co-authored-by: Marion Schaeffer <92590517+schmarion@users.noreply.github.com>

* load only test split (#25)

Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr>

* Update mteb/tasks/BitextMining/DiaBLaBitextMining.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update mteb/tasks/Clustering/HALClusteringS2S.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* renaming masakhane (#28)

Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr>

* Syntec dataset addition (#26)

* add scrpit to process & load to HF

* add script to enable download of data from HF

* add syntec dataset files to gitignore

* add syntecretrieval

* add syntec retrival

* build dataloading script

* remove datasets

* correct typo

---------

Co-authored-by: Sequeira Gabriel <gabriel.sequeira@outlook.fr>

* 30 add syntec reranking (#31)

* change name to secify retrieval

* add reranking tasks

* create script to upload dataset fo reranking task

* create reranking task

* add reranking tasks

* add model name in description

* SummEval translated to french (#32)

* 7 sts (#33)

* taike into account multilingual tasks

* add stsbenchmark multilingual dataset

* add STS tasks

* taike into account multilingual tasks

* add stsbenchmark multilingual dataset

* add STS tasks

* add coma

* Adding sick fr dataset to sts tasks (#34)

* Adding sick fr dataset to sts tasks
* modifying dataset in load function to have the right column names

* Fix alloprof dataset (#36)

* change revision to use

* remove duplicate data

* change main metric because dataset is hard (#37)

* Fix alloprof dataset (#40)

* change revision to use

* remove duplicate data

* change revision

* handle queries train test split

* change dataset creation method

* change revision

* handle queries train test split

* change dataset creation method

* Fix DiaBLa by inheriting CrossLingual class (#42)

* Fix DiaBLa by inheriting CrossLingual class

* remove remaining print

* Fix DiaBLa integration

* Update mteb/tasks/BitextMining/FloresBitextMining.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update README.md

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update README.md

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update mteb/tasks/Classification/MasakhaNEWSClassification.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update README.md

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update README.md

* Update mteb/tasks/BitextMining/FloresBitextMining.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update mteb/evaluation/MTEB.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update mteb/abstasks/AbsTaskPairClassification.py

Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com>

* Update README.md

* Update scripts/data/syntec/create_data_reranking.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update scripts/data/alloprof/create_data_reranking.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update scripts/run_mteb_french.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update scripts/run_mteb_french.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update mteb/evaluation/MTEB.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update mteb/evaluation/MTEB.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update mteb/tasks/Retrieval/HagridRetrieval.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update mteb/tasks/Clustering/MLSUMClusteringP2P.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update mteb/tasks/Clustering/MLSUMClusteringS2S.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update mteb/tasks/Clustering/MasakhaNEWSClusteringP2P.py

* Update mteb/tasks/Clustering/MasakhaNEWSClusteringS2S.py

* Update mteb/tasks/STS/SickFrSTS.py

* Inherit OpusparcusPC init from MultilingualTask

* remove unnecessary init

* Remove train split from evaluation on MasakhaNEWSClassification (#52)

remove train split from evaluation

* put script on HF dataset repos (#56)

* put script on HF dataset repos

* remove scripts

* 49 fix dictionnary in syntecretrieval (#54)

* add trust remote code arg

* leave corpus as dict

* remove trust remote code

* add Tatoeba & BUCC BitextMining tasks (#57)

add bucc and tatoeba bitextmining tasks

* 46 add other languages to masakhaneweclusterings2s and p2p (#58)

* add other language to clustering tasks

* fix main score and S2S task

* update run fr becnhmark script

* Update run_mteb_french.py

* Update AbsTaskClustering.py

* remove train and validation splits

---------

Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr>
Co-authored-by: Marion Schaeffer <92590517+schmarion@users.noreply.github.com>
Co-authored-by: mciancone@openstudio.fr <mciancone@openstudio.fr>
Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com>
Co-authored-by: mciancone <73994289+Sunalwing@users.noreply.github.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: wissam-sib <36303760+wissam-sib@users.noreply.github.com>
Co-authored-by: Wissam Siblini <wissam.siblini92@gmail.com>
Muennighoff added a commit that referenced this pull request Feb 27, 2024
* add Masakhane dataset config

* add trigram lang code for dataset who use it

* create french script eval

* fix French word

* add some documentation

* add script to process and upload alloprof on HF

* build script for HF

* adding dataset processing for mteb

* add script to process and upload alloprof on HF

* build script for HF

* adding dataset processing for mteb

* refactor few thing

* remove whitespaces

* 4 pair classification (#10)

* add Opusparcus dataset

* multilingual usage

* use eval_split of config files

* change eval_split according to data

---------

Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr>

* add script to process and upload alloprof on HF

* build script for HF

* adding dataset processing for mteb

* refactor few thing

* remove whitespaces

* Clustering with HAL S2S dataset (#11)

HAL S2S dataset creation and evaluation on clustering task.

* adding BSARD dataset

* add BSARD to benchmark

* adding Hagrid dataset

* DiaBLa and Flores Bitext Mining evaluation (#12)

* Add DiaBLa dataset for bitext mining

* Add DiaBLa dataset for bitext mining

* deduplicate bitext task

* add Flores

* format files

* add flores to evaluation script

* remove prints

* add revision

---------

Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr>

* add script to process and upload alloprof on HF

* build script for HF

* adding dataset processing for mteb

* refactor few thing

* remove whitespaces

* adding dataset processing for mteb

* adding BSARD dataset

* add BSARD to benchmark

* adding Hagrid dataset

* fix change on langmapping

* reset alphabetical order

* add revision handling

* Clustering: Add AlloProf dataset  (#17)

AlloProf dataset for clustering task

* handling of revision

* change split + add revision handling

* add script to process and upload alloprof on HF

* build script for HF

* adding dataset processing for mteb

* refactor few thing

* remove whitespaces

* adding dataset processing for mteb

* adding BSARD dataset

* add BSARD to benchmark

* adding Hagrid dataset

* add script to process and upload alloprof on HF

* adding dataset processing for mteb

* refactor few thing

* reset alphabetical order

* add revision handling

* handling of revision

* change split + add revision handling

* use eval variable

* alphabetic order

* Add MLSUM dataset for clustering task (#21)

* Use Masakhane dataset for clustering task (#23)

* 16 add datasets to readmemd (#18)

* run task table

* run task table

* Add MLSUM dataset for clustering task (#21)

* Use Masakhane dataset for clustering task (#23)

* run task table

* refresh readme

* refresh readme

* run task table

* refresh readme

---------

Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr>
Co-authored-by: Marion Schaeffer <92590517+schmarion@users.noreply.github.com>

* load only test split (#25)

Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr>

* Update mteb/tasks/BitextMining/DiaBLaBitextMining.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update mteb/tasks/Clustering/HALClusteringS2S.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* renaming masakhane (#28)

Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr>

* Syntec dataset addition (#26)

* add scrpit to process & load to HF

* add script to enable download of data from HF

* add syntec dataset files to gitignore

* add syntecretrieval

* add syntec retrival

* build dataloading script

* remove datasets

* correct typo

---------

Co-authored-by: Sequeira Gabriel <gabriel.sequeira@outlook.fr>

* 30 add syntec reranking (#31)

* change name to secify retrieval

* add reranking tasks

* create script to upload dataset fo reranking task

* create reranking task

* add reranking tasks

* add model name in description

* SummEval translated to french (#32)

* 7 sts (#33)

* taike into account multilingual tasks

* add stsbenchmark multilingual dataset

* add STS tasks

* taike into account multilingual tasks

* add stsbenchmark multilingual dataset

* add STS tasks

* add coma

* Adding sick fr dataset to sts tasks (#34)

* Adding sick fr dataset to sts tasks
* modifying dataset in load function to have the right column names

* Fix alloprof dataset (#36)

* change revision to use

* remove duplicate data

* change main metric because dataset is hard (#37)

* Fix alloprof dataset (#40)

* change revision to use

* remove duplicate data

* change revision

* handle queries train test split

* change dataset creation method

* change revision

* handle queries train test split

* change dataset creation method

* Fix DiaBLa by inheriting CrossLingual class (#42)

* Fix DiaBLa by inheriting CrossLingual class

* remove remaining print

* Fix DiaBLa integration

* Update mteb/tasks/BitextMining/FloresBitextMining.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update README.md

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update README.md

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update mteb/tasks/Classification/MasakhaNEWSClassification.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update README.md

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update README.md

* Update mteb/tasks/BitextMining/FloresBitextMining.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update mteb/evaluation/MTEB.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update mteb/abstasks/AbsTaskPairClassification.py

Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com>

* Update README.md

* Update scripts/data/syntec/create_data_reranking.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update scripts/data/alloprof/create_data_reranking.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update scripts/run_mteb_french.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update scripts/run_mteb_french.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update mteb/evaluation/MTEB.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update mteb/evaluation/MTEB.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update mteb/tasks/Retrieval/HagridRetrieval.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update mteb/tasks/Clustering/MLSUMClusteringP2P.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update mteb/tasks/Clustering/MLSUMClusteringS2S.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update mteb/tasks/Clustering/MasakhaNEWSClusteringP2P.py

* Update mteb/tasks/Clustering/MasakhaNEWSClusteringS2S.py

* Update mteb/tasks/STS/SickFrSTS.py

* Inherit OpusparcusPC init from MultilingualTask

* remove unnecessary init

* Remove train split from evaluation on MasakhaNEWSClassification (#52)

remove train split from evaluation

* put script on HF dataset repos (#56)

* put script on HF dataset repos

* remove scripts

* 49 fix dictionnary in syntecretrieval (#54)

* add trust remote code arg

* leave corpus as dict

* remove trust remote code

* add Tatoeba & BUCC BitextMining tasks (#57)

add bucc and tatoeba bitextmining tasks

* 46 add other languages to masakhaneweclusterings2s and p2p (#58)

* add other language to clustering tasks

* fix main score and S2S task

* update run fr becnhmark script

* Update run_mteb_french.py

* Update AbsTaskClustering.py

* remove train and validation splits

* remove Hagrid (#60)

---------

Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr>
Co-authored-by: Marion Schaeffer <92590517+schmarion@users.noreply.github.com>
Co-authored-by: mciancone@openstudio.fr <mciancone@openstudio.fr>
Co-authored-by: Sequeira Gabriel <gabriel.sequeira@outlook.fr>
Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: wissam-sib <36303760+wissam-sib@users.noreply.github.com>
Co-authored-by: Wissam Siblini <wissam.siblini92@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants