Fix SummEval NaN scores by Muennighoff · Pull Request #33 · embeddings-benchmark/mteb

Muennighoff · 2022-08-03T19:42:15Z

If all scores are the same in SummEval, the correlation computation will produce NaN scores.
For example for the below Machine translations, some models encode them as the same embedding.
I couldn't come up with a better solution than skipping - any better ideas?

["japanese photographer yûki aoyama 's latest series of images capture po-faced teenagers in the air next to their daughters . images are from the 37-year-old 's latest book that roughly translates to daughter and a salaryman each image . the images are also to be part of a book of the father-daughter photographer . the 37-year-old has been captured by photographer photographer yûki .", "japanese photographer yûki aoyama 's latest series of images capture po-faced teenagers pictured next to their fathers leaping into the air . in each picture the daughter looks directly into the camera smiling while her father pulls a dramatic pose . the images are from the 37-year-old 's latest book which roughly translates into daughter and salary man .", "japanese photographer yûki aoyama 's latest series of images capture po-faced teenagers pictured next to their fathers leaping into the air . the images are from the 37-year-old 's latest book which roughly translates into daughter and salary man . a series of images by photographer yûki aoyama sees fathers leaping into the air next to their daughters .", "photographer yûki aoyama 's latest series of images capture po-faced teenagers pictured next to their fathers leaping into the air . in each picture the daughter looks directly into the camera smiling while her father pulls a dramatic pose .", "japanese photographer yûki aoyama 's latest series of images capture po-faced teenagers pictured next to their fathers leaping into the air . in each picture the daughter looks directly into the camera smiling while her father pulls a dramatic pose .", "japanese photographer yûki aoyama 's latest series of images capture po-faced teenagers pictured next to their fathers leaping into the air . in each picture the daughter looks directly into the camera smiling while her father pulls a dramatic pose .", "japanese photographer yûki aoyama 's latest series of images capture po - faced teenagers pictured next to their fathers leaping into the air . in each picture the daughter looks directly into the camera smiling while her father pulls a dramatic pose . a series of images by photographer yûki aoyama sees fathers leaping into the air next to their daughters the images are from the 37 - year - old 's latest book which roughly translates into daughter and salary man .", "japanese photographer yûki aoyama's latest series of images capture po-faced teenagers pictured next to their fathers . in each picture the daughter looks directly into the camera smiling while her father pulls a dramatic pose . the images are said to be part of a book which roughly translates as daughter and salaryman .", 'A father-daughter pair who have been posing for their father in front of various Japanese landmarks are making a name for themselves online. Related:', "japanese photographer y ? ki aoyama's latest series of images capture po-faced teenagers . in each picture the daughter looks directly into the camera smiling while her father pulls a dramatic pose . the images are said to be part of a book which roughly translates as daughter and salary man .", "japanese photographer yûki aoyama 's latest series of images capture po-faced teenagers pictured next to their fathers leaping into the air . a series of images by photographer yûki aoyama sees fathers leaping into the air next to their daughters . the images are from the 37-year-old 's latest book which roughly translates into daughter and salary man .", "sick of awkward father-daughter portraits ? well one photographer has found an effective - if a little odd - way of making them more interesting . japanese photographer yûki aoyama 's latest series of images capture po-faced teenagers pictured next to their fathers leaping into the air .", "Japanese photographer Yûki Aoyama 's latest series of images capture po-faced teenagers pictured next to their fathers leaping into the air . In each picture the daughter looks directly into the camera smiling while her father pulls a dramatic pose . The images are said to be part of a book which roughly translates into Daughter and Salaryman .", "japanese photographer yûki aoyama 's latest series of images capture po-faced teenagers pictured next to their fathers . the images are from the 37-year-old 's latest book which roughly translates as daughter and salaryman . each image sees the daughter stood po-faced as their father makes an energetic leap .", "japanese photographer yûki aoyama 's latest series of images capture po-faced teenagers next to their daughters . images are said to be part of a book which roughly translates into daughter and salary man . the 37-year-old 's images include .", "japanese photographer yûki aoyama 's latest series of images capture po-faced teenagers . images are from the 37-year-old 's latest book which translates into daughter and salary man . photographer yûki aoyama sees fathers leaping into air next to their daughters . in each picture the daughter looks directly into the camera smiling while father pulls a dramatic pose ."]

NouamaneTazi · 2022-08-03T22:30:26Z

LGTM! Would be cool to get @nreimers's opinion also before merging :)

Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>

Muennighoff · 2022-08-14T14:52:33Z

@nreimers Okay with you if we merge this? Curious if you have a better idea to treat this kind of behavior though

* add Masakhane dataset config * add trigram lang code for dataset who use it * create french script eval * fix French word * add some documentation * add script to process and upload alloprof on HF * build script for HF * adding dataset processing for mteb * add script to process and upload alloprof on HF * build script for HF * adding dataset processing for mteb * refactor few thing * remove whitespaces * 4 pair classification (#10) * add Opusparcus dataset * multilingual usage * use eval_split of config files * change eval_split according to data --------- Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr> * add script to process and upload alloprof on HF * build script for HF * adding dataset processing for mteb * refactor few thing * remove whitespaces * Clustering with HAL S2S dataset (#11) HAL S2S dataset creation and evaluation on clustering task. * adding BSARD dataset * add BSARD to benchmark * adding Hagrid dataset * DiaBLa and Flores Bitext Mining evaluation (#12) * Add DiaBLa dataset for bitext mining * Add DiaBLa dataset for bitext mining * deduplicate bitext task * add Flores * format files * add flores to evaluation script * remove prints * add revision --------- Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr> * add script to process and upload alloprof on HF * build script for HF * adding dataset processing for mteb * refactor few thing * remove whitespaces * adding dataset processing for mteb * adding BSARD dataset * add BSARD to benchmark * adding Hagrid dataset * fix change on langmapping * reset alphabetical order * add revision handling * Clustering: Add AlloProf dataset (#17) AlloProf dataset for clustering task * handling of revision * change split + add revision handling * add script to process and upload alloprof on HF * build script for HF * adding dataset processing for mteb * refactor few thing * remove whitespaces * adding dataset processing for mteb * adding BSARD dataset * add BSARD to benchmark * adding Hagrid dataset * add script to process and upload alloprof on HF * adding dataset processing for mteb * refactor few thing * reset alphabetical order * add revision handling * handling of revision * change split + add revision handling * use eval variable * alphabetic order * Add MLSUM dataset for clustering task (#21) * Use Masakhane dataset for clustering task (#23) * 16 add datasets to readmemd (#18) * run task table * run task table * Add MLSUM dataset for clustering task (#21) * Use Masakhane dataset for clustering task (#23) * run task table * refresh readme * refresh readme * run task table * refresh readme --------- Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr> Co-authored-by: Marion Schaeffer <92590517+schmarion@users.noreply.github.com> * load only test split (#25) Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr> * Update mteb/tasks/BitextMining/DiaBLaBitextMining.py Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Update mteb/tasks/Clustering/HALClusteringS2S.py Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * renaming masakhane (#28) Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr> * Syntec dataset addition (#26) * add scrpit to process & load to HF * add script to enable download of data from HF * add syntec dataset files to gitignore * add syntecretrieval * add syntec retrival * build dataloading script * remove datasets * correct typo --------- Co-authored-by: Sequeira Gabriel <gabriel.sequeira@outlook.fr> * 30 add syntec reranking (#31) * change name to secify retrieval * add reranking tasks * create script to upload dataset fo reranking task * create reranking task * add reranking tasks * add model name in description * SummEval translated to french (#32) * 7 sts (#33) * taike into account multilingual tasks * add stsbenchmark multilingual dataset * add STS tasks * taike into account multilingual tasks * add stsbenchmark multilingual dataset * add STS tasks * add coma * Adding sick fr dataset to sts tasks (#34) * Adding sick fr dataset to sts tasks * modifying dataset in load function to have the right column names * Fix alloprof dataset (#36) * change revision to use * remove duplicate data * change main metric because dataset is hard (#37) * Fix alloprof dataset (#40) * change revision to use * remove duplicate data * change revision * handle queries train test split * change dataset creation method * change revision * handle queries train test split * change dataset creation method * Fix DiaBLa by inheriting CrossLingual class (#42) * Fix DiaBLa by inheriting CrossLingual class * remove remaining print * Fix DiaBLa integration * Update mteb/tasks/BitextMining/FloresBitextMining.py Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Update README.md Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Update README.md Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Update mteb/tasks/Classification/MasakhaNEWSClassification.py Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Update README.md Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Update README.md * Update mteb/tasks/BitextMining/FloresBitextMining.py Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Update mteb/evaluation/MTEB.py Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Update mteb/abstasks/AbsTaskPairClassification.py Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com> * Update README.md * Update scripts/data/syntec/create_data_reranking.py Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Update scripts/data/alloprof/create_data_reranking.py Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Update scripts/run_mteb_french.py Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Update scripts/run_mteb_french.py Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Update mteb/evaluation/MTEB.py Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Update mteb/evaluation/MTEB.py Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Update mteb/tasks/Retrieval/HagridRetrieval.py Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Update mteb/tasks/Clustering/MLSUMClusteringP2P.py Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Update mteb/tasks/Clustering/MLSUMClusteringS2S.py Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Update mteb/tasks/Clustering/MasakhaNEWSClusteringP2P.py * Update mteb/tasks/Clustering/MasakhaNEWSClusteringS2S.py * Update mteb/tasks/STS/SickFrSTS.py * Inherit OpusparcusPC init from MultilingualTask * remove unnecessary init * Remove train split from evaluation on MasakhaNEWSClassification (#52) remove train split from evaluation * put script on HF dataset repos (#56) * put script on HF dataset repos * remove scripts * 49 fix dictionnary in syntecretrieval (#54) * add trust remote code arg * leave corpus as dict * remove trust remote code * add Tatoeba & BUCC BitextMining tasks (#57) add bucc and tatoeba bitextmining tasks * 46 add other languages to masakhaneweclusterings2s and p2p (#58) * add other language to clustering tasks * fix main score and S2S task * update run fr becnhmark script * Update run_mteb_french.py * Update AbsTaskClustering.py * remove train and validation splits --------- Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr> Co-authored-by: Marion Schaeffer <92590517+schmarion@users.noreply.github.com> Co-authored-by: mciancone@openstudio.fr <mciancone@openstudio.fr> Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com> Co-authored-by: mciancone <73994289+Sunalwing@users.noreply.github.com> Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> Co-authored-by: wissam-sib <36303760+wissam-sib@users.noreply.github.com> Co-authored-by: Wissam Siblini <wissam.siblini92@gmail.com>

* add Masakhane dataset config * add trigram lang code for dataset who use it * create french script eval * fix French word * add some documentation * add script to process and upload alloprof on HF * build script for HF * adding dataset processing for mteb * add script to process and upload alloprof on HF * build script for HF * adding dataset processing for mteb * refactor few thing * remove whitespaces * 4 pair classification (#10) * add Opusparcus dataset * multilingual usage * use eval_split of config files * change eval_split according to data --------- Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr> * add script to process and upload alloprof on HF * build script for HF * adding dataset processing for mteb * refactor few thing * remove whitespaces * Clustering with HAL S2S dataset (#11) HAL S2S dataset creation and evaluation on clustering task. * adding BSARD dataset * add BSARD to benchmark * adding Hagrid dataset * DiaBLa and Flores Bitext Mining evaluation (#12) * Add DiaBLa dataset for bitext mining * Add DiaBLa dataset for bitext mining * deduplicate bitext task * add Flores * format files * add flores to evaluation script * remove prints * add revision --------- Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr> * add script to process and upload alloprof on HF * build script for HF * adding dataset processing for mteb * refactor few thing * remove whitespaces * adding dataset processing for mteb * adding BSARD dataset * add BSARD to benchmark * adding Hagrid dataset * fix change on langmapping * reset alphabetical order * add revision handling * Clustering: Add AlloProf dataset (#17) AlloProf dataset for clustering task * handling of revision * change split + add revision handling * add script to process and upload alloprof on HF * build script for HF * adding dataset processing for mteb * refactor few thing * remove whitespaces * adding dataset processing for mteb * adding BSARD dataset * add BSARD to benchmark * adding Hagrid dataset * add script to process and upload alloprof on HF * adding dataset processing for mteb * refactor few thing * reset alphabetical order * add revision handling * handling of revision * change split + add revision handling * use eval variable * alphabetic order * Add MLSUM dataset for clustering task (#21) * Use Masakhane dataset for clustering task (#23) * 16 add datasets to readmemd (#18) * run task table * run task table * Add MLSUM dataset for clustering task (#21) * Use Masakhane dataset for clustering task (#23) * run task table * refresh readme * refresh readme * run task table * refresh readme --------- Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr> Co-authored-by: Marion Schaeffer <92590517+schmarion@users.noreply.github.com> * load only test split (#25) Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr> * Update mteb/tasks/BitextMining/DiaBLaBitextMining.py Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Update mteb/tasks/Clustering/HALClusteringS2S.py Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * renaming masakhane (#28) Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr> * Syntec dataset addition (#26) * add scrpit to process & load to HF * add script to enable download of data from HF * add syntec dataset files to gitignore * add syntecretrieval * add syntec retrival * build dataloading script * remove datasets * correct typo --------- Co-authored-by: Sequeira Gabriel <gabriel.sequeira@outlook.fr> * 30 add syntec reranking (#31) * change name to secify retrieval * add reranking tasks * create script to upload dataset fo reranking task * create reranking task * add reranking tasks * add model name in description * SummEval translated to french (#32) * 7 sts (#33) * taike into account multilingual tasks * add stsbenchmark multilingual dataset * add STS tasks * taike into account multilingual tasks * add stsbenchmark multilingual dataset * add STS tasks * add coma * Adding sick fr dataset to sts tasks (#34) * Adding sick fr dataset to sts tasks * modifying dataset in load function to have the right column names * Fix alloprof dataset (#36) * change revision to use * remove duplicate data * change main metric because dataset is hard (#37) * Fix alloprof dataset (#40) * change revision to use * remove duplicate data * change revision * handle queries train test split * change dataset creation method * change revision * handle queries train test split * change dataset creation method * Fix DiaBLa by inheriting CrossLingual class (#42) * Fix DiaBLa by inheriting CrossLingual class * remove remaining print * Fix DiaBLa integration * Update mteb/tasks/BitextMining/FloresBitextMining.py Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Update README.md Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Update README.md Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Update mteb/tasks/Classification/MasakhaNEWSClassification.py Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Update README.md Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Update README.md * Update mteb/tasks/BitextMining/FloresBitextMining.py Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Update mteb/evaluation/MTEB.py Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Update mteb/abstasks/AbsTaskPairClassification.py Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com> * Update README.md * Update scripts/data/syntec/create_data_reranking.py Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Update scripts/data/alloprof/create_data_reranking.py Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Update scripts/run_mteb_french.py Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Update scripts/run_mteb_french.py Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Update mteb/evaluation/MTEB.py Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Update mteb/evaluation/MTEB.py Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Update mteb/tasks/Retrieval/HagridRetrieval.py Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Update mteb/tasks/Clustering/MLSUMClusteringP2P.py Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Update mteb/tasks/Clustering/MLSUMClusteringS2S.py Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> * Update mteb/tasks/Clustering/MasakhaNEWSClusteringP2P.py * Update mteb/tasks/Clustering/MasakhaNEWSClusteringS2S.py * Update mteb/tasks/STS/SickFrSTS.py * Inherit OpusparcusPC init from MultilingualTask * remove unnecessary init * Remove train split from evaluation on MasakhaNEWSClassification (#52) remove train split from evaluation * put script on HF dataset repos (#56) * put script on HF dataset repos * remove scripts * 49 fix dictionnary in syntecretrieval (#54) * add trust remote code arg * leave corpus as dict * remove trust remote code * add Tatoeba & BUCC BitextMining tasks (#57) add bucc and tatoeba bitextmining tasks * 46 add other languages to masakhaneweclusterings2s and p2p (#58) * add other language to clustering tasks * fix main score and S2S task * update run fr becnhmark script * Update run_mteb_french.py * Update AbsTaskClustering.py * remove train and validation splits * remove Hagrid (#60) --------- Co-authored-by: Gabriel Sequeira <gsequeira@openstudio.fr> Co-authored-by: Marion Schaeffer <92590517+schmarion@users.noreply.github.com> Co-authored-by: mciancone@openstudio.fr <mciancone@openstudio.fr> Co-authored-by: Sequeira Gabriel <gabriel.sequeira@outlook.fr> Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com> Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> Co-authored-by: wissam-sib <36303760+wissam-sib@users.noreply.github.com> Co-authored-by: Wissam Siblini <wissam.siblini92@gmail.com>

Muennighoff added 5 commits August 3, 2022 21:24

Drop nans

20c22a9

Skip samples with no variance

d39be65

Remove superfluous imports

68f7307

Remove debug leftovers

c674d0a

Add consistent brackets

2cdd283

Muennighoff requested review from NouamaneTazi and loicmagne and removed request for loicmagne August 3, 2022 19:52

NouamaneTazi reviewed Aug 3, 2022

View reviewed changes

Comment thread mteb/evaluation/evaluators/SummarizationEvaluator.py Outdated

NouamaneTazi approved these changes Aug 3, 2022

View reviewed changes

NouamaneTazi requested a review from nreimers August 3, 2022 22:31

Update mteb/evaluation/evaluators/SummarizationEvaluator.py

f667749

Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>

Muennighoff merged commit 48586e2 into embeddings-benchmark:main Aug 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix SummEval NaN scores#33

Fix SummEval NaN scores#33
Muennighoff merged 6 commits into
embeddings-benchmark:mainfrom
Muennighoff:fix/summeval

Muennighoff commented Aug 3, 2022

Uh oh!

Uh oh!

NouamaneTazi commented Aug 3, 2022

Uh oh!

Muennighoff commented Aug 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Muennighoff commented Aug 3, 2022

Uh oh!

Uh oh!

NouamaneTazi commented Aug 3, 2022

Uh oh!

Muennighoff commented Aug 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants