[v2] Merge main 30 08#3102
Merged
Merged
Conversation
* feat: unify text and image embeddings for all tasks * fix: uniform batch size * fix: update error message * fix: update code task * fix: update max length * fix: apply review suggestions
* feat: add KaLM_Embedding_X_0605 in kalm_models * Update kalm_models.py for lint format * kalm-emb-v2 * kalm-emb-v2 * kalm-emb-v2 * kalm-emb-v2 * kalm-emb-v2 --------- Co-authored-by: xinshuohu <xinshuohu@tencent.com> Co-authored-by: Xinshuo Hu <yanshek.woo@gmail.com>
* Adding Classification Evaluator test * Modifications due to the comments * Update tests/test_evaluators/test_ClassificationEvaluator.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Update tests/test_evaluators/test_ClassificationEvaluator.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Modifications due to the comments * Modifications due to the comments --------- Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>
* adding vidore benchmarks * fix typo * clean vidore names + per lang eval * lint * vidore names * bibtex fix * fix revision * vidore v2 citation * update citation format and fix per-language mappings * lint: citations * typo citations * fix revisiions * lint * fix colnomic3b revision * fix colqwen2.5 revision + latest repo version * fix query agmentation tokens * colsmol revision
* Adding Classification Evaluator test * Modifications due to the comments * Update tests/test_evaluators/test_ClassificationEvaluator.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Update tests/test_evaluators/test_ClassificationEvaluator.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Modifications due to the comments * Modifications due to the comments * Adding STSEvaluator and SummarizationEvaluator tests * Correcting due to the comments * Correcting due to the comments --------- Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>
* Classification dataset cleaning * Update pull request number * Fix metadata test * fix formatting * add script for cleaning
Add JapaneseSentimentClassification
add opensearch inf-free models Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
* Add BareExamQA retrieval task * ran linter * updated details * updated details * fixed subtype name * fixed changes * ran linter again
Update adding_a_dataset.md
specify revision for opensearch
…2939) The leaderboard would have (silent) errors where `get_benchmark` lead to a KeyError due to "selector_state" being passed as a default value. Setting `DEFAULT_BENCMARK_NAME` as the value solves this issue.
* docs: Update adding_a_dataset.md * Update docs/adding_a_dataset.md
* BSARD loader fixed * BSARDv2 metadata fixed * Update mteb/tasks/Retrieval/fra/BSARDRetrieval.py --------- Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>
* Added govreport task * Updated description
* Added BillSum datasets * fixed billsumca * Updated BillSumCA description * Updated BillSumUS description * Update mteb/tasks/Retrieval/eng/BillSumCA.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Update mteb/tasks/Retrieval/eng/BillSumUS.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * lint * lint --------- Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
…2716) * Add RuSciBench * fix bitext mining lang * Add regression task * fix init * add missing files * Improve description * Add superseded_by * fix lint * Update regression task to match with v2 * Add stratified_subsampling for regression task * Add boostrap for regression task * Rename task class, add model as evaluator argument * fix import * fix import 2 * fixes * fix * Rename regression model protocol
* Commentout bibtex formatting * Remove `-n auto` * get back bibtex * try limiting versions * revert coverage * revert coverage --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
* feat - Combine Plots and Tables into a Single Tab #3009 * feat - Resize the plot to make it more readable * feat - Remove the (radar chart) * feat - Add a comment stating that it only shows the Top 5 models in the table. * feat - adjust layout * Update mteb/leaderboard/app.py * format --------- Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
# Conflicts: # Makefile # docs/adding_a_dataset.md # mteb/abstasks/AbsTaskRetrieval.py # mteb/abstasks/TaskMetadata.py # mteb/abstasks/__init__.py # mteb/benchmarks/get_benchmark.py # mteb/encoder_interface.py # mteb/evaluation/evaluators/Image/Any2AnyRetrievalEvaluator.py # mteb/evaluation/evaluators/RerankingEvaluator.py # mteb/evaluation/evaluators/RetrievalEvaluator.py # mteb/evaluation/evaluators/__init__.py # mteb/leaderboard/app.py # mteb/leaderboard/benchmark_selector.py # mteb/models/cohere_v.py # mteb/models/model_implementations/bedrock_models.py # mteb/models/model_implementations/cohere_models.py # mteb/models/model_implementations/colbert_models.py # mteb/models/model_implementations/colpali_models.py # mteb/models/model_implementations/colqwen_models.py # mteb/models/model_implementations/colsmol_models.py # mteb/models/model_implementations/gme_v_models.py # mteb/models/model_implementations/google_models.py # mteb/models/model_implementations/jasper_models.py # mteb/models/model_implementations/jina_models.py # mteb/models/model_implementations/kalm_models.py # mteb/models/model_implementations/llm2vec_models.py # mteb/models/model_implementations/nomic_models.py # mteb/models/model_implementations/ru_sentence_models.py # mteb/models/model_implementations/voyage_models.py # mteb/models/model_implementations/voyage_v.py # mteb/models/overview.py # mteb/models/sentence_transformer_wrapper.py # mteb/models/vlm2vec_models.py # mteb/models/wrapper.py # mteb/tasks/BitextMining/__init__.py # mteb/tasks/Classification/__init__.py # mteb/tasks/Classification/ces/CSFDCZMovieReviewSentimentClassification.py # mteb/tasks/Classification/ces/CzechProductReviewSentimentClassification.py # mteb/tasks/Classification/ces/CzechSoMeSentimentClassification.py # mteb/tasks/Classification/dan/AngryTweetsClassification.py # mteb/tasks/Classification/dan/DKHateClassification.py # mteb/tasks/Classification/dan/DanishPoliticalCommentsClassification.py # mteb/tasks/Classification/dan/DdiscoCohesionClassification.py # mteb/tasks/Classification/deu/GermanPoliticiansTwitterSentimentClassification.py # mteb/tasks/Classification/deu/TenKGnadClassification.py # mteb/tasks/Classification/eng/AmazonPolarityClassification.py # mteb/tasks/Classification/eng/ArxivClassification.py # mteb/tasks/Classification/eng/Banking77Classification.py # mteb/tasks/Classification/eng/DBpediaClassification.py # mteb/tasks/Classification/eng/EmotionClassification.py # mteb/tasks/Classification/eng/FinancialPhrasebankClassification.py # mteb/tasks/Classification/eng/FrenkEnClassification.py # mteb/tasks/Classification/eng/ImdbClassification.py # mteb/tasks/Classification/eng/LegalBenchClassification.py # mteb/tasks/Classification/eng/NewsClassification.py # mteb/tasks/Classification/eng/PatentClassification.py # mteb/tasks/Classification/eng/PoemSentimentClassification.py # mteb/tasks/Classification/eng/SDSEyeProtectionClassification.py # mteb/tasks/Classification/eng/SDSGlovesClassification.py # mteb/tasks/Classification/eng/ToxicChatClassification.py # mteb/tasks/Classification/eng/ToxicConversationsClassification.py # mteb/tasks/Classification/eng/TweetSentimentExtractionClassification.py # mteb/tasks/Classification/eng/TweetTopicSingleClassification.py # mteb/tasks/Classification/eng/WikipediaBioMetChemClassification.py # mteb/tasks/Classification/eng/WikipediaChemFieldsClassification.py # mteb/tasks/Classification/eng/WikipediaCompChemSpectroscopyClassification.py # mteb/tasks/Classification/eng/WikipediaCrystallographyAnalyticalClassification.py # mteb/tasks/Classification/eng/WikipediaTheoreticalAppliedClassification.py # mteb/tasks/Classification/eng/YahooAnswersTopicsClassification.py # mteb/tasks/Classification/eng/YelpReviewFullClassification.py # mteb/tasks/Classification/est/estonian_valence.py # mteb/tasks/Classification/fas/FaMTEBClassification.py # mteb/tasks/Classification/fil/FilipinoHateSpeechClassification.py # mteb/tasks/Classification/fin/FinToxicityClassification.py # mteb/tasks/Classification/fra/FrenchBookReviews.py # mteb/tasks/Classification/fra/MovieReviewSentimentClassification.py # mteb/tasks/Classification/guj/GujaratiNewsClassification.py # mteb/tasks/Classification/heb/HebrewSentimentAnalysis.py # mteb/tasks/Classification/hin/HindiDiscourseClassification.py # mteb/tasks/Classification/hin/SentimentAnalysisHindi.py # mteb/tasks/Classification/hrv/FrenkHrClassification.py # mteb/tasks/Classification/ind/IndonesianIdClickbaitClassification.py # mteb/tasks/Classification/ind/IndonesianMongabayConservationClassification.py # mteb/tasks/Classification/ita/ItalianLinguistAcceptabilityClassification.py # mteb/tasks/Classification/jav/JavaneseIMDBClassification.py # mteb/tasks/Classification/jpn/WRIMEClassification.py # mteb/tasks/Classification/kan/KannadaNewsClassification.py # mteb/tasks/Classification/kor/KlueTC.py # mteb/tasks/Classification/kor/KorHateClassification.py # mteb/tasks/Classification/kor/KorSarcasmClassification.py # mteb/tasks/Classification/kur/KurdishSentimentClassification.py # mteb/tasks/Classification/mal/MalayalamNewsClassification.py # mteb/tasks/Classification/mar/MarathiNewsClassification.py # mteb/tasks/Classification/mkd/MacedonianTweetSentimentClassification.py # mteb/tasks/Classification/mya/MyanmarNews.py # mteb/tasks/Classification/nep/NepaliNewsClassification.py # mteb/tasks/Classification/nld/DutchBookReviewSentimentClassification.py # mteb/tasks/Classification/nob/NoRecClassification.py # mteb/tasks/Classification/nob/NorwegianParliamentClassification.py # mteb/tasks/Classification/ory/OdiaNewsClassification.py # mteb/tasks/Classification/pol/PolishClassification.py # mteb/tasks/Classification/ron/Moroco.py # mteb/tasks/Classification/ron/RomanianReviewsSentiment.py # mteb/tasks/Classification/ron/RomanianSentimentClassification.py # mteb/tasks/Classification/rus/GeoreviewClassification.py # mteb/tasks/Classification/rus/HeadlineClassification.py # mteb/tasks/Classification/rus/InappropriatenessClassification.py # mteb/tasks/Classification/rus/RuReviewsClassification.py # mteb/tasks/Classification/rus/RuSciBenchGRNTIClassification.py # mteb/tasks/Classification/rus/RuSciBenchOECDClassification.py # mteb/tasks/Classification/rus/ru_toixic_classification_okmlcup.py # mteb/tasks/Classification/rus/senti_ru_eval.py # mteb/tasks/Classification/sin/SinhalaNewsClassification.py # mteb/tasks/Classification/sin/SinhalaNewsSourceClassification.py # mteb/tasks/Classification/slk/CSFDSKMovieReviewSentimentClassification.py # mteb/tasks/Classification/slk/SlovakHateSpeechClassification.py # mteb/tasks/Classification/slv/FrenkSlClassification.py # mteb/tasks/Classification/spa/SpanishNewsClassification.py # mteb/tasks/Classification/spa/SpanishSentimentClassification.py # mteb/tasks/Classification/ssw/SiswatiNewsClassification.py # mteb/tasks/Classification/svk/SlovakMovieReviewSentimentClassification.py # mteb/tasks/Classification/swa/SwahiliNewsClassification.py # mteb/tasks/Classification/swe/DalajClassification.py # mteb/tasks/Classification/swe/SweRecClassification.py # mteb/tasks/Classification/swe/SwedishSentimentClassification.py # mteb/tasks/Classification/tam/TamilNewsClassification.py # mteb/tasks/Classification/tel/TeluguAndhraJyotiNewsClassification.py # mteb/tasks/Classification/tha/WisesightSentimentClassification.py # mteb/tasks/Classification/tsn/TswanaNewsClassification.py # mteb/tasks/Classification/tur/TurkishMovieSentimentClassification.py # mteb/tasks/Classification/tur/TurkishProductSentimentClassification.py # mteb/tasks/Classification/ukr/UkrFormalityClassification.py # mteb/tasks/Classification/urd/UrduRomanSentimentClassification.py # mteb/tasks/Classification/vie/VieStudentFeedbackClassification.py # mteb/tasks/Classification/zho/CMTEBClassification.py # mteb/tasks/Classification/zho/YueOpenriceReviewClassification.py # mteb/tasks/Classification/zul/IsiZuluNewsClassification.py # mteb/tasks/Clustering/__init__.py # mteb/tasks/Image/Any2AnyRetrieval/__init__.py # mteb/tasks/PairClassification/__init__.py # mteb/tasks/Reranking/__init__.py # mteb/tasks/Retrieval/__init__.py # mteb/tasks/STS/__init__.py # pyproject.toml # tests/test_benchmark/mock_models.py # tests/test_benchmark/test_benchmark.py # tests/test_models/test_model_meta.py # tests/test_reproducible_workflow.py
Member
Author
|
Now tests are failing, because some tasks have missing metadata. I'll calculate it later |
KennethEnevoldsen
approved these changes
Sep 1, 2025
KennethEnevoldsen
left a comment
Contributor
There was a problem hiding this comment.
Minor question but generally looks good - thanks for doing the merge
Contributor
There was a problem hiding this comment.
What do you mean when you say that you aligned it with classification?
Member
Author
There was a problem hiding this comment.
Mostly all in e78d04c Changed LinearRegressionEvaluator __init__ and __call__ to work with datasets, in AbsTaskTextRegression aligned _evaluate_subset with v2
Contributor
There was a problem hiding this comment.
Cool - made me think that it might be easy to merge it with the classification (but maybe not) - regardless it is for another PR - feel free to merge
This was referenced Oct 10, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Also aligned Rergression task with classification