fix: remove * imports#1569
Conversation
* add more stat * add more stat * update statistics
Bugfixes with data parsing in main figure
* Fixed task result loading from disk * Fixed task result loading from disk
* fix: Removed column wrapping on the table, so that it remains readable * Added disclaimer to figure * fix: Added links to task info table, switched out license with metric
* small fix * fix: fix
swap touche2020 for parity
* add sum per lang * add sort by sum option * make lint
* feat: add CUREv1 dataset --------- Co-authored-by: nadshe <nadia.sheikh@clinia.com> Co-authored-by: olivierr42 <olivier.rousseau@clinia.com> Co-authored-by: Daniel Buades Marcos <daniel@buad.es> * feat: add missing domains to medical tasks * feat: modify benchmark tasks * chore: benchmark naming --------- Co-authored-by: nadshe <nadia.sheikh@clinia.com> Co-authored-by: olivierr42 <olivier.rousseau@clinia.com>
* check if model attr of model exists * lint * Fix retrieval evaluator
* Made get_scores error tolerant * Added join_revisions, made get_scores failsafe * Fetching metadata fixed fr HF models * Added failsafe metadata fetching to leaderboard code * Added revision joining to leaderboard app * fix * Only show models that have metadata, when filter_models is called * Ran linting
Filtering for models that have metadata
* align readme with current mteb * align with mieb branch * fix test
* add lang family mapping and map to task table * make lint * add back some unclassified lang codes
* Correction of SICK-R metadata * Correction of SICK-R metadata --------- Co-authored-by: rposwiata <rposwiata@opi.org.pl>
…05` and `text-multilingual-embedding-002` (#1562) * fix: google_models batching and prompt * feat: add text-embedding-005 and text-multilingual-embedding-002 * chore: `make lint` errors * fix: address PR comments
fix: bm25s implementation
# Conflicts: # docs/create_tasks_table.py # docs/tasks.md # mteb/abstasks/AbsTaskClassification.py # mteb/abstasks/AbsTaskClusteringFast.py # mteb/abstasks/AbsTaskInstructionRetrieval.py # mteb/abstasks/AbsTaskMultilabelClassification.py # mteb/abstasks/AbsTaskPairClassification.py # mteb/abstasks/AbsTaskReranking.py # mteb/abstasks/AbsTaskRetrieval.py # mteb/abstasks/AbsTaskSTS.py # mteb/descriptive_stats/InstructionRetrieval/Core17InstructionRetrieval.json # mteb/descriptive_stats/MultilabelClassification/MultiEURLEXMultilabelClassification.json # mteb/descriptive_stats/Reranking/AskUbuntuDupQuestions.json # mteb/descriptive_stats/Reranking/ESCIReranking.json # mteb/descriptive_stats/Reranking/WikipediaRerankingMultilingual.json # mteb/descriptive_stats/Retrieval/AppsRetrieval.json # mteb/descriptive_stats/Retrieval/BelebeleRetrieval.json # mteb/descriptive_stats/Retrieval/COIRCodeSearchNetRetrieval.json # mteb/descriptive_stats/Retrieval/CodeEditSearchRetrieval.json # mteb/descriptive_stats/Retrieval/CodeFeedbackMT.json # mteb/descriptive_stats/Retrieval/CodeFeedbackST.json # mteb/descriptive_stats/Retrieval/CodeSearchNetCCRetrieval.json # mteb/descriptive_stats/Retrieval/CodeSearchNetRetrieval.json # mteb/descriptive_stats/Retrieval/CodeTransOceanContest.json # mteb/descriptive_stats/Retrieval/CodeTransOceanDL.json # mteb/descriptive_stats/Retrieval/CosQA.json # mteb/descriptive_stats/Retrieval/JaqketRetrieval.json # mteb/descriptive_stats/Retrieval/NFCorpus.json # mteb/descriptive_stats/Retrieval/StackOverflowQA.json # mteb/descriptive_stats/Retrieval/SyntheticText2SQL.json # mteb/descriptive_stats/Retrieval/Touche2020.json # mteb/descriptive_stats/Retrieval/Touche2020Retrieval.v3.json # mteb/descriptive_stats/Retrieval/mFollowIRCrossLingualInstructionRetrieval.json # mteb/descriptive_stats/Retrieval/mFollowIRInstructionRetrieval.json # mteb/evaluation/MTEB.py # mteb/evaluation/evaluators/RetrievalEvaluator.py # mteb/leaderboard/table.py # mteb/model_meta.py # mteb/models/arctic_models.py # mteb/models/e5_models.py # mteb/models/nomic_models.py # mteb/models/sentence_transformers_models.py # mteb/tasks/PairClassification/multilingual/XStance.py # mteb/tasks/Reranking/zho/CMTEBReranking.py # mteb/tasks/STS/por/SickBrSTS.py # tests/test_benchmark/mock_tasks.py
* fix: bm25s implementation * correct library name --------- Co-authored-by: Daniel Buades Marcos <daniel.buades@clinia.com>
* fix: Add training dataset to model meta Adresses #1556 * Added docs * format
… for visualization (#1564) * feat: batch requests to cohere models * fix: use correct task_type * feat: use tqdm with openai * fix: explicitely set `show_progress_bar` to False
# Conflicts: # mteb/model_meta.py
|
Nice! Was wondering actually, if this could be merged to main instead? I don't think there were any major compatible changes? (as in, everything should run the same as before) |
KennethEnevoldsen
left a comment
There was a problem hiding this comment.
Looks great. I'm very happy about this!
It was a big frustration for me in #1567 so I am very happy to see it. Did you do it all manually?
Nice! Was wondering actually, if this could be merged to main instead? I don't think there were any major compatible changes? (as in, everything should run the same as before)
Before you could e.g. do
from mteb import load_datasets
I believe this will no longer be possible
|
I see a script being used to generate these. To see changes from this PR, I was switching the commits one at a time. Might be nice if the merge from main was separated. |
|
Yes, as @isaac-chung mentioned, I wrote a script to generate imports for tasks, but for other directories, I did it manually. For future PRs I won't merge |
isaac-chung
left a comment
There was a problem hiding this comment.
Thanks for tackling this :)
Might be worth running the same benchmarks in #1463 again as a comparison.
Checklist
make test.make lint.Ref #1463
Also merged changes from
mainand found 3 datasets that previously never imported: