Merge main 05 10 by Samoed · Pull Request #3246 · embeddings-benchmark/mteb

Samoed · 2025-10-03T20:01:43Z

If you add a model or a dataset, please add the corresponding checklist:

Automatically generated by python-semantic-release

* model: Add BMRetriever * Update mteb/models/bmretriever_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/bmretriever_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * fix: remove trust_remote_code option * feat: implement BMREtrieverWrapper based on InstructSentenceTransformerWrapper * refactor: update training datasets for bmretriever --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

Revert "Ci: test out GH models with welcoming new comers (#3112)" This reverts commit 73a35e0.

* add codefuse models * add codefuse models * Update codefuse_models.py * lint codefuse.py

Automatically generated by python-semantic-release

* Adding Cohere's output_dimension and embedding_type parameter Cohere's embed-v4 binary and int8 * Correcting due to comments

* feat: add swedish cpc patent classifications to mteb * fix: formatting and init imports * fix: update mteb task according to feedback * fix: perform citation and code formatting * fix: add train and test split for both datasets

* fix: delete kwargs for similarity score in ColPaliEngineWrapper for method behavior * chore: fix colpali_models similarity handle device

Automatically generated by python-semantic-release

* fix(models): prevent EOS token truncation for BMRetriever * refactor(models): refactor tokenizer setup in `InstructSentenceTransformerWrapper` * fix(models): correct eos token handling in `BMRetrieverWrapper`

Automatically generated by python-semantic-release

* update giga embeddings * update giga embeddings * 3b-september-2025 * fixed * lint * Update mteb/models/ru_sentence_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * change revision due to flash-attn dependency * change apply_instruction_to_passages --------- Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Неизвестный Пользователь722497 <dolegosmirnov@sberbank.ru>

* feat - Split create_tables into static Benchmark methods * feat - format * Update mteb/leaderboard/table.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * feat - remove search query;take benchmark result as input;addressing the circular import, * feat - format * Update mteb/benchmarks/benchmark.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Update mteb/benchmarks/benchmark.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * feat - use to_dataframe;clean table.py;move creat_table * feat - fix circular import * feat - clean-up * feat - format --------- Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

Automatically generated by python-semantic-release

Adding another voyageai model

* Update qzhou_models.py * Update qzhou_models.py * reformat script code * Update configuration * According to our new decision, the model name has been changed to "QZhou-Embedding-Zh". * Fix variable naming issues.

* add youtu models * add a blank line * fix the optional dependencies and lint the code * remove unused dependencies and reformat * revise prompt_type * update youtu_models --------- Co-authored-by: springxchen <springxchen@tencent.com>

* add software issue localization datasets * add software issue localization datasets * update and add multilingual datasets * fix citation format issues * Update mteb/tasks/Reranking/eng/SWEbenchVerifiedReranking.py * fix linting issues --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* feat - adjust Rteb's Benchmark * feat - add blank * fix menu names * Update mteb/leaderboard/benchmark_selector.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * moving around tasks * fix: Update RTEB summary columns (#3226) * fix(models): ensure prompt_type is passed to format_instruction (#3216) * 1.38.58 Automatically generated by python-semantic-release * Adding Cohere's output_dimension and embedding_type parameter (#3204) * Adding Cohere's output_dimension and embedding_type parameter Cohere's embed-v4 binary and int8 * Correcting due to comments * dataset: add swedish cpc patent classifications to mteb (#3072) * feat: add swedish cpc patent classifications to mteb * fix: formatting and init imports * fix: update mteb task according to feedback * fix: perform citation and code formatting * fix: add train and test split for both datasets * fix: AttributeError in ColPaliEngineWrapper similarity method (#3177) * fix: delete kwargs for similarity score in ColPaliEngineWrapper for method behavior * chore: fix colpali_models similarity handle device * Update tasks & benchmarks tables * 1.38.59 Automatically generated by python-semantic-release * fix: prevent EOS token truncation (#3218) * fix(models): prevent EOS token truncation for BMRetriever * refactor(models): refactor tokenizer setup in `InstructSentenceTransformerWrapper` * fix(models): correct eos token handling in `BMRetrieverWrapper` * 1.38.60 Automatically generated by python-semantic-release * Update giga embeddings (#3210) * update giga embeddings * update giga embeddings * 3b-september-2025 * fixed * lint * Update mteb/models/ru_sentence_models.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * change revision due to flash-attn dependency * change apply_instruction_to_passages --------- Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Неизвестный Пользователь722497 <dolegosmirnov@sberbank.ru> * fix: Refactor split create_tables into static Benchmark methods (#3126) * feat - Split create_tables into static Benchmark methods * feat - format * Update mteb/leaderboard/table.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * feat - remove search query;take benchmark result as input;addressing the circular import, * feat - format * Update mteb/benchmarks/benchmark.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Update mteb/benchmarks/benchmark.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * feat - use to_dataframe;clean table.py;move creat_table * feat - fix circular import * feat - clean-up * feat - format --------- Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * 1.38.61 Automatically generated by python-semantic-release * Extending the RTEB benchmark (#3223) Adding another voyageai model * Update tasks & benchmarks tables * feat - filter_by_privacy * feat - add new fields for rteb part * feat - getattr * feat - adjust privacy filter logic * feat - enhance summary table column renaming and add 'is_public' field mapping * fix: remove unused 'is_public' attribute from TaskResult --------- Co-authored-by: Yongbin Choi <whybe.choi@gmail.com> Co-authored-by: semantic-release <semantic-release> Co-authored-by: fzoll <5575946+fzoll@users.noreply.github.com> Co-authored-by: Atheer <atheer2104@protonmail.com> Co-authored-by: Yong woo Song <ywsong.dev@kakao.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Egor <31567312+ekolodin@users.noreply.github.com> Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Неизвестный Пользователь722497 <dolegosmirnov@sberbank.ru> Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> Co-authored-by: smile <smile@pinai.io> Co-authored-by: ethan <smiletoye@gmail.com> * removed show_rteb args * avoid defining function where we can just use the metadata * minor fixes * minor fixes * fix: Correct logic for filtering public tasks in ModelResult class (#3230) Co-authored-by: ethan <smiletoye@gmail.com> --------- Co-authored-by: q275343119 <275343119@qq.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: 笑尿伊人 <44760272+q275343119@users.noreply.github.com> Co-authored-by: Yongbin Choi <whybe.choi@gmail.com> Co-authored-by: fzoll <5575946+fzoll@users.noreply.github.com> Co-authored-by: Atheer <atheer2104@protonmail.com> Co-authored-by: Yong woo Song <ywsong.dev@kakao.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Egor <31567312+ekolodin@users.noreply.github.com> Co-authored-by: Kolodin Egor <eikolodin@sberbank.ru> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> Co-authored-by: Неизвестный Пользователь722497 <dolegosmirnov@sberbank.ru> Co-authored-by: smile <smile@pinai.io> Co-authored-by: ethan <smiletoye@gmail.com>

Automatically generated by python-semantic-release

* fix: Add rteb submission references and improve descriptions. * Added evaluation request * added field for tasks

Automatically generated by python-semantic-release

* Human Subsets Tasks * Fixed Multilingual Classification Subset * linting * fix citations format * make lint * fix tests * remove human folder * fix relative imports * add adapted_from for all human subsets * fix pydantic errors * add benchmark object * make benchmark discoverable * bibtex test * Apply suggestion Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Apply suggestions from code review Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * rename & reupload * upd tests * upd tests again * add model * add benchmark to leaderboard * change branch of leaderboard * remove branch of load data * fix model meta path * make mteb importable * update repo * Update mteb/benchmarks/benchmarks/benchmarks.py * Update mteb/leaderboard/benchmark_selector.py * Update mteb/load_results/load_results.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Adnan El Assadi <aassadi22@ku.edu.tr> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: AdnanElAssadi56 <115242814+AdnanElAssadi56@users.noreply.github.com>

* Remove 'HUME(v1)' from leaderboard benchmark * lint

* update adding_a_benchmark.md documentation * fix numbers

* fix: Further specified macro-language code for Norwegian "nor" is a macro-language code that covers bokmål and nynorsk (both norwegian), but this means that these datasets will be missed if using "nob" or "nno". Specifying it like this should allow this. * furhter specified macro language "nor"

Automatically generated by python-semantic-release

# Conflicts: # docs/benchmarks.md # mteb/benchmarks/benchmark.py # mteb/benchmarks/benchmarks/__init__.py # mteb/benchmarks/benchmarks/benchmarks.py # mteb/evaluation/evaluators/RerankingEvaluator.py # mteb/leaderboard/benchmark_selector.py # mteb/leaderboard/table.py # mteb/load_results.py # mteb/models/abs_encoder.py # mteb/models/instruct_wrapper.py # mteb/models/model_implementations/cohere_models.py # mteb/models/model_implementations/cohere_v.py # mteb/models/model_implementations/ru_sentence_models.py # mteb/models/model_implementations/youtu_models.py # mteb/models/overview.py # mteb/results/benchmark_results.py # mteb/tasks/Classification/__init__.py # mteb/tasks/Clustering/__init__.py # mteb/tasks/MultiLabelClassification/__init__.py # mteb/tasks/Reranking/__init__.py # mteb/tasks/Retrieval/multilingual/MKQARetrieval.py # mteb/tasks/STS/__init__.py # scripts/make_leaderboard.py

* fix python39 transformers * fix

aggregate by subset for HUMEv1

Fix AbsTaskTextRegression

* feat - add Japanese * feat - use mteb.get_benchmark * fix - 3.9 test error * Revert "fix - 3.9 test error" This reverts commit 6bfee53. * fix - 3.9 test error

# Conflicts: # mteb/benchmarks/benchmarks/__init__.py # mteb/benchmarks/benchmarks/benchmarks.py # mteb/models/bm25.py

whybe-choi and others added 30 commits September 21, 2025 08:28

fix: Correct metadata for ArguAna dataset (#3202)

90e9f43

Update tasks & benchmarks tables

920dafe

1.38.57

cd37c7a

Automatically generated by python-semantic-release

Revert "Ci: test out GH models with welcoming new comers" (#3206)

6e72dc0

Revert "Ci: test out GH models with welcoming new comers (#3112)" This reverts commit 73a35e0.

model: Add Codefuse models (#3205)

4f6d791

* add codefuse models * add codefuse models * Update codefuse_models.py * lint codefuse.py

fix(models): ensure prompt_type is passed to format_instruction (#3216)

82d9e29

1.38.58

d0d427d

Automatically generated by python-semantic-release

Adding Cohere's output_dimension and embedding_type parameter (#3204)

08bba49

* Adding Cohere's output_dimension and embedding_type parameter Cohere's embed-v4 binary and int8 * Correcting due to comments

fix: AttributeError in ColPaliEngineWrapper similarity method (#3177)

8c180d4

* fix: delete kwargs for similarity score in ColPaliEngineWrapper for method behavior * chore: fix colpali_models similarity handle device

Update tasks & benchmarks tables

0aacba4

1.38.59

2e292cf

Automatically generated by python-semantic-release

fix: prevent EOS token truncation (#3218)

f58ac2b

* fix(models): prevent EOS token truncation for BMRetriever * refactor(models): refactor tokenizer setup in `InstructSentenceTransformerWrapper` * fix(models): correct eos token handling in `BMRetrieverWrapper`

1.38.60

3e86531

Automatically generated by python-semantic-release

1.38.61

a52723a

Automatically generated by python-semantic-release

Extending the RTEB benchmark (#3223)

4f58684

Adding another voyageai model

Update tasks & benchmarks tables

7f5990a

model: New qzmodel (#3211)

e299345

* Update qzhou_models.py * Update qzhou_models.py * reformat script code * Update configuration * According to our new decision, the model name has been changed to "QZhou-Embedding-Zh". * Fix variable naming issues.

model: Update Youtu embedding model (#3227)

0000ae2

* add youtu models * add a blank line * fix the optional dependencies and lint the code * remove unused dependencies and reformat * revise prompt_type * update youtu_models --------- Co-authored-by: springxchen <springxchen@tencent.com>

Update tasks & benchmarks tables

65f29e6

Update tasks & benchmarks tables

867105f

1.39.0

cf26684

Automatically generated by python-semantic-release

fix: Add submission references for RTEB (#3233)

600c290

* fix: Add rteb submission references and improve descriptions. * Added evaluation request * added field for tasks

1.39.1

12fe80b

Automatically generated by python-semantic-release

github-actions Bot and others added 12 commits October 2, 2025 06:17

Update tasks & benchmarks tables

9a606a0

Remove 'HUME(v1)' from leaderboard benchmark (#3236)

e419b54

* Remove 'HUME(v1)' from leaderboard benchmark * lint

docs: Update adding benchmark documentation (#3229)

50aa4ac

* update adding_a_benchmark.md documentation * fix numbers

Update tasks & benchmarks tables

810ae28

1.39.2

9249630

Automatically generated by python-semantic-release

fix max tokens (#3243)

2f6eb2a

fix models

3cea9e4

fix imports

8902461

fix task import

3b95bb5

reupload HUME tasks

56b0e4b

Samoed added the v2 label Oct 3, 2025

Samoed mentioned this pull request Oct 3, 2025

dataset: Add Software Issue Localization Datasets #3178

Merged

7 tasks

Samoed added 2 commits October 4, 2025 00:16

reupload SWE tasks

9a7723f

add stats

34a3b90

Samoed requested a review from KennethEnevoldsen October 4, 2025 16:01

Samoed and others added 8 commits October 5, 2025 11:06

fix python39 transformers compatibility (#3254)

85e1dd9

* fix python39 transformers * fix

Aggregate by subset for HUMEv1 (#3255)

36901eb

aggregate by subset for HUMEv1

Update tasks & benchmarks tables

89bec7d

Fix AbsTaskTextRegression task (#3257)

08b98cd

Fix AbsTaskTextRegression

Added Japanese to Retrieval (#3252)

53b1c29

* feat - add Japanese * feat - use mteb.get_benchmark * fix - 3.9 test error * Revert "fix - 3.9 test error" This reverts commit 6bfee53. * fix - 3.9 test error

Update tasks & benchmarks tables

c8ae52c

fix bm25 on small datasets (#3261)

237d8dc

Merge branch 'main' into merge_main_05_10

6e2766d

# Conflicts: # mteb/benchmarks/benchmarks/__init__.py # mteb/benchmarks/benchmarks/benchmarks.py # mteb/models/bm25.py

KennethEnevoldsen enabled auto-merge (squash) October 6, 2025 11:59

Samoed force-pushed the merge_main_05_10 branch from ec748ef to 6e2766d Compare October 6, 2025 12:08

KennethEnevoldsen merged commit 3529e93 into v2.0.0 Oct 6, 2025
20 checks passed

KennethEnevoldsen deleted the merge_main_05_10 branch October 6, 2025 14:08

Samoed mentioned this pull request Oct 14, 2025

Recompute descriptive stats with new format #3279

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merge main 05 10#3246

Merge main 05 10#3246
KennethEnevoldsen merged 52 commits into
v2.0.0from
merge_main_05_10

Samoed commented Oct 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

15 participants

Uh oh!

Conversation

Samoed commented Oct 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

15 participants