[v2] Merge main 20 09 by Samoed · Pull Request #3193 · embeddings-benchmark/mteb

Samoed · 2025-09-20T11:49:52Z

No description provided.

Automatically generated by python-semantic-release

* align task prompt dict with `PromptType` * use value instead of enum

Automatically generated by python-semantic-release

…3090) * Add ModelMeta for OrdalieTech/Solon-embeddings-mini-beta-1.1 * Add training_datasets (common_corpus, fineweb, wiki_fr, private LLM-synth) * Format with ruff + add loader per review * Apply ruff format/fixes * Update mteb/models/ordalietech_solon_embeddings_mini_beta_1_1.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/ordalietech_solon_embeddings_mini_beta_1_1.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Register OrdalieTech/Solon-embeddings-mini-beta-1.1 in overview (ModelMeta + loader) * Update mteb/models/ordalietech_solon_embeddings_mini_beta_1_1.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * fix import * Add memory_usage_mb=808.0 and required fields to ModelMeta * Fix 210 milions of parameters --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

* - Added an include_private parameter to the get_tasks() function that defaults to False - This ensures that by default, tests only run on public datasets - Tests can explicitly set include_private=True when needed to test private datasets - Added is_public: bool | None = None field to TaskMetadata - The field is optional and defaults to None (treated as public) - Updated the is_filled() method to exclude is_public from required fields - Added documentation * - Added an include_private parameter to the get_tasks() function that defaults to False - This ensures that by default, tests only run on public datasets - Tests can explicitly set include_private=True when needed to test private datasets - Added is_public: bool | None = None field to TaskMetadata - The field is optional and defaults to None (treated as public) - Updated the is_filled() method to exclude is_public from required fields - Added documentation * Correcting due to comments * Update mteb/abstasks/TaskMetadata.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Update mteb/overview.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Removing the not used filter_tasks_by_privacy function * Correcting due to comments * Correcting due to comments * Correcting due to comments * Removing the test case * Rename the include_private parameter to exclude_private * Rename the include_private parameter to exclude_private * Add private tasks tests * Add private tasks tests * Update tests/test_tasks/test_private_tasks.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Add private tasks tests * Add private tasks tests * Add private tasks tests --------- Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

Automatically generated by python-semantic-release

test out GH models with welcoming new comers

* add dataset check on new PR * add extract datasets * run as module * update startswith * update workflow name * add GitPython * export var * same shell session * address review comments * add to docs to say what this script does * add docs

* add youtu models * add a blank line * fix the optional dependencies and lint the code * remove unused dependencies and reformat * revise prompt_type --------- Co-authored-by: springxchen <springxchen@tencent.com>

* Adding quantization support * Update mteb/models/voyage_models.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Update mteb/model_meta.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/model_meta.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Simplifying the quantization/output_dtype * Update mteb/model_meta.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> --------- Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

Automatically generated by python-semantic-release

* model: EmbeddingGemma 300M * Add license and revision

* feat - remove special filtering, keep zero-shot, keep borda rank * feat - remove get_rteb_benchmark.py * feat - delete get_rteb_benchmark.py;RTEB_BENCHMARK_ENTRIES changes * feat -format * Update mteb/load_results/benchmark_results.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

Automatically generated by python-semantic-release

* chore: add 'Patent retrieval' subtype to TaskMetadata * feat(retrieval): add DAPFAM patent retrieval tasks (+18 variants) * Dapfam patent retrieval PR #2946 : refactor DAPFAM tasks (explicit classes, license, metadata, custom definition explanation ...) * Dapfam patent retrieval PR #2946 : refactor DAPFAM tasks (explicit classes, license, metadata, custom definition explanation ...) * Update mteb/tasks/Retrieval/eng/DAPFAMPatentRetrieval.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Update mteb/tasks/Retrieval/eng/DAPFAMPatentRetrieval.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Update mteb/tasks/Retrieval/eng/DAPFAMPatentRetrieval.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Update mteb/tasks/Retrieval/eng/DAPFAMPatentRetrieval.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Update mteb/tasks/Retrieval/eng/DAPFAMPatentRetrieval.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Update mteb/tasks/Retrieval/eng/DAPFAMPatentRetrieval.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Changes : - Added possibility to opt in or out of quantization through the "quantize" argument. - Added possibility to compute raw dot product without normalization. (to reproduce the paper method the "similarity" argument should be "cosine"). - Removed unecessary function and overhauled the tasks descriptions to be more clear. * Update mteb/tasks/Retrieval/eng/DAPFAMPatentRetrieval.py * Update mteb/tasks/Retrieval/eng/DAPFAMPatentRetrieval.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Update mteb/tasks/Retrieval/eng/DAPFAMPatentRetrieval.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Update mteb/tasks/Retrieval/eng/DAPFAMPatentRetrieval.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Changes made : - Overhauled task descriptions as well as naming to conform with the naming scheme of mteb retrieval tasks. - Similarity is now computed using the similarity function of the passed model. - Changed optional quantization method to conform with sentence transformers similarity function. to reproduce the paper metrics, one can use the following snippet : ```python from mteb import mteb from sentence_transformers import SentenceTransformer model_name = "Snowflake/snowflake-arctic-embed-m-v2.0" model = SentenceTransformer(model_name, model_kwargs={ "torch_dtype": "float16", }, trust_remote_code=True, ).cuda().eval() tasks = mteb.get_tasks(tasks=[ "DAPFAMInTitlAbsToTitlAbsClmRetrieval", "DAPFAMAllTitlAbsToTitlAbsClmRetrieval", "DAPFAMOutTitlAbsToTitlAbsClmRetrieval", add the other 3 remaining tasks ... ]) evaluation = mteb.MTEB(tasks=tasks) results = evaluation.run( model, output_folder=f"mteb_res/{model_name}", quantize=True, # if set to false or not set, the obtained ndcg@10 and map@10 will be ~0.001 higher encode_kwargs= {"batch_size" : 32} ) ``` * changed default value of quantization to false * added the import to all DAPFAM tasks; tested that the works; verified compliance with the checklist * Update mteb/tasks/Retrieval/eng/DAPFAMPatentRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * added revision numbers to all dataset loading operations as well as the metadata itself * intermediate changes, refresh local branch * intermediate changes, refresh local branch again * scale back to standard evaluation with empty set exclusion, various cosmetic/formatting changes * minor cosmetic/formatting changes * fixed main metric to be ndcg_at_100 as in the paper * removed old code artifacts from previous versions * read appropriate loading arguments from task metadata, remove unecessary class attribute * reformat bibtex ( remark on the assertion since it tries to match literal string instead of bibtex formatting, format inconsistent with arXiv default), fixed metadata, parameters read from task metadata, all tests passed * refactor data loading to read from metadata class attributes --------- Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

…3185) Correct the batch creation

#3168) * Adding JapaneseCode1Retrieval as the first non-public dataset * Transformed dataset * Adding as private dataset to tests * Correct the private task test * Use the sample dataset as a reference * Use the sample dataset as a reference * fix ds loading * allow on forks * upd aciton * remove paths * try to trigger ci * add ref * add permissions * remove paths * add paths back * get back to pull request * rollback action * Trying to resolve the token/secret problem * Trying to resolve the token/secret problem * Update dataset_loading_pr.yml * Update dataset_loading_pr.yml * Try the latest datasets package (worked for me) * Try the latest datasets package (worked for me) * Try the latest datasets package (worked for me) * (last?) try * (last?) try * (last?) try * Reverting the changes * Exclude the private datasets from tests * Apply suggestions from code review --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> Co-authored-by: Solomatin Roman <samoed.roman@gmail.com> Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

add version check

* Add 12 more closed datasets Extend the RTEB benchmarks * trust_remote_code * trust_remote_code * Enabling JapaneseCode1Retrieval in the RTEB benchmarks * Add closed datasets as private tasks * Correct due to the comment

* Update benchmark to version 2 * make others in benchmark selector one line code * small changes * update a few tasks metadata * update faintent license with correct form * remove redundant trust remote codes * fix hardnegatives revision * make lint * fix errors * apply suggestions * fix citation problem * add PR link to benchmark desc * remove duplicate dataset names in mcinext_models * update prompts --------- Co-authored-by: mehran <mehan.sarmadi16@gmail.com>

Automatically generated by python-semantic-release

fix conflict dependencies

Automatically generated by python-semantic-release

# Conflicts: # mteb/abstasks/TaskMetadata.py # mteb/custom_validators.py # mteb/leaderboard/app.py # mteb/leaderboard/text_segments.py # mteb/models/model_implementations/google_models.py # mteb/models/model_implementations/voyage_models.py # mteb/models/overview.py # mteb/tasks/Classification/fas/FaMTEBClassification.py # mteb/tasks/Retrieval/__init__.py # pyproject.toml # scripts/extract_model_names.py # tests/test_models/test_model_meta.py

KennethEnevoldsen · 2025-09-22T15:20:25Z

@@ -0,0 +1,40 @@
+name: Welcome New Contributors


Ahh didn't see this one - looking forward to seeing it in action

Sadly it's not working #3117

KennethEnevoldsen · 2025-09-22T15:21:08Z

unsure if we should delete these in v2

I think we can delete this, because we have site with all documentation

KennethEnevoldsen

Looks good no particular worries

fzoll and others added 30 commits September 1, 2025 09:41

fix: Updating the default batch size calculation in the voyage models (…

5851c7a

…#3091)

1.38.50

80966c2

Automatically generated by python-semantic-release

fix: Add @classmethod for @field_validators in TaskMetadata (#3100)

4012517

Align task prompt dict with PromptType (#3101)

7303c15

* align task prompt dict with `PromptType` * use value instead of enum

1.38.51

b7b5d11

Automatically generated by python-semantic-release

1.38.52

07bf861

Automatically generated by python-semantic-release

Ci: test out GH models with welcoming new comers (#3112)

73a35e0

test out GH models with welcoming new comers

ci: Dataset check on new PR (#3103)

6e8eba1

* add dataset check on new PR * add extract datasets * run as module * update startswith * update workflow name * add GitPython * export var * same shell session * address review comments * add to docs to say what this script does * add docs

model: add Youtu-Embedding-V1 (#3115)

652ff2b

* add youtu models * add a blank line * fix the optional dependencies and lint the code * remove unused dependencies and reformat * revise prompt_type --------- Co-authored-by: springxchen <springxchen@tencent.com>

1.38.53

647c8c3

Automatically generated by python-semantic-release

model: EmbeddingGemma 300M (#3129)

729f20a

* model: EmbeddingGemma 300M * Add license and revision

Update tasks & benchmarks tables

32c9746

1.38.54

4e5f597

Automatically generated by python-semantic-release

Update tasks & benchmarks tables

b622870

Align max tokens (#3172)

10c4948

Correct the VoyageAI model's batch creation/batch size calculation (#…

ed68a89

…3185) Correct the batch creation

fix: add version check for embeddinggemma-300m (#3189)

2093798

add version check

dataset: Added a set of closed datasets (#3186)

bc303ad

* Add 12 more closed datasets Extend the RTEB benchmarks * trust_remote_code * trust_remote_code * Enabling JapaneseCode1Retrieval in the RTEB benchmarks * Add closed datasets as private tasks * Correct due to the comment

Update tasks & benchmarks tables

d682c85

fix: Edit ack & sponsors (#3187)

57ffd43

Update tasks & benchmarks tables

7266873

1.38.55

6811486

Automatically generated by python-semantic-release

fix: Add conflicting dependencies to toml (#3191)

0cc6802

fix conflict dependencies

semantic-release and others added 3 commits September 18, 2025 14:03

1.38.56

3306aeb

Automatically generated by python-semantic-release

fix

1e4264e

Samoed requested a review from KennethEnevoldsen September 20, 2025 11:49

Samoed added the v2 label Sep 20, 2025

Samoed added 9 commits September 20, 2025 14:56

add dapfam

c49d800

add stats

9377c16

fix descriptive_stats package

5ff882a

fix descriptive stat loading

f1dd493

fix descriptive stat loading

a1e7f14

fix dapfam

cb9fc29

add stats

52c8873

move models files

83885c5

fix training datasets

93541d2

KennethEnevoldsen reviewed Sep 22, 2025

View reviewed changes

KennethEnevoldsen approved these changes Sep 22, 2025

View reviewed changes

KennethEnevoldsen merged commit 7c46163 into v2.0.0 Sep 22, 2025
10 checks passed

KennethEnevoldsen deleted the merge_main_20_09 branch September 22, 2025 15:24

Samoed mentioned this pull request Oct 14, 2025

Recompute descriptive stats with new format #3279

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[v2] Merge main 20 09#3193

[v2] Merge main 20 09#3193
KennethEnevoldsen merged 42 commits into
v2.0.0from
merge_main_20_09

Samoed commented Sep 20, 2025

Uh oh!

KennethEnevoldsen Sep 22, 2025

Uh oh!

Samoed Sep 22, 2025

Uh oh!

KennethEnevoldsen Sep 22, 2025 •

edited

Loading

Uh oh!

Samoed Sep 22, 2025

Uh oh!

KennethEnevoldsen left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Uh oh!

Conversation

Samoed commented Sep 20, 2025

Uh oh!

KennethEnevoldsen Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

Samoed Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Samoed Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

KennethEnevoldsen Sep 22, 2025 •

edited

Loading