Skip to content

[v2] Merge main 20 09#3193

Merged
KennethEnevoldsen merged 42 commits into
v2.0.0from
merge_main_20_09
Sep 22, 2025
Merged

[v2] Merge main 20 09#3193
KennethEnevoldsen merged 42 commits into
v2.0.0from
merge_main_20_09

Conversation

@Samoed

@Samoed Samoed commented Sep 20, 2025

Copy link
Copy Markdown
Member

No description provided.

fzoll and others added 30 commits September 1, 2025 09:41
Automatically generated by python-semantic-release
* align task prompt dict with `PromptType`

* use value instead of enum
Automatically generated by python-semantic-release
…3090)

* Add ModelMeta for OrdalieTech/Solon-embeddings-mini-beta-1.1

* Add training_datasets (common_corpus, fineweb, wiki_fr, private LLM-synth)

* Format with ruff + add loader per review

* Apply ruff format/fixes

* Update mteb/models/ordalietech_solon_embeddings_mini_beta_1_1.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/models/ordalietech_solon_embeddings_mini_beta_1_1.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Register OrdalieTech/Solon-embeddings-mini-beta-1.1 in overview (ModelMeta + loader)

* Update mteb/models/ordalietech_solon_embeddings_mini_beta_1_1.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

* fix import

* Add memory_usage_mb=808.0 and required fields to ModelMeta

* Fix 210 milions of parameters

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>
* - Added an include_private parameter to the get_tasks() function that defaults to False
  - This ensures that by default, tests only run on public datasets
  - Tests can explicitly set include_private=True when needed to test private datasets

  - Added is_public: bool | None = None field to TaskMetadata
  - The field is optional and defaults to None (treated as public)
  - Updated the is_filled() method to exclude is_public from required fields
  - Added documentation

* - Added an include_private parameter to the get_tasks() function that defaults to False
  - This ensures that by default, tests only run on public datasets
  - Tests can explicitly set include_private=True when needed to test private datasets

  - Added is_public: bool | None = None field to TaskMetadata
  - The field is optional and defaults to None (treated as public)
  - Updated the is_filled() method to exclude is_public from required fields
  - Added documentation

* Correcting due to comments

* Update mteb/abstasks/TaskMetadata.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

* Update mteb/overview.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

* Removing the not used filter_tasks_by_privacy function

* Correcting due to comments

* Correcting due to comments

* Correcting due to comments

* Removing the test case

* Rename the include_private parameter to exclude_private

* Rename the include_private parameter to exclude_private

* Add private tasks tests

* Add private tasks tests

* Update tests/test_tasks/test_private_tasks.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

* Add private tasks tests

* Add private tasks tests

* Add private tasks tests

---------

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>
Automatically generated by python-semantic-release
test out GH models with welcoming new comers
* add dataset check on new PR

* add extract datasets

* run as module

* update startswith

* update workflow name

* add GitPython

* export var

* same shell session

* address review comments

* add to docs to say what this script does

* add docs
* add youtu models

* add a blank line

* fix the optional dependencies and lint the code

* remove unused dependencies and reformat

* revise prompt_type

---------

Co-authored-by: springxchen <springxchen@tencent.com>
* Adding quantization support

* Update mteb/models/voyage_models.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

* Update mteb/model_meta.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update mteb/model_meta.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Simplifying the quantization/output_dtype

* Update mteb/model_meta.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

---------

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Automatically generated by python-semantic-release
* model: EmbeddingGemma 300M

* Add license and revision
* feat - remove special filtering, keep zero-shot, keep borda rank

* feat - remove get_rteb_benchmark.py

* feat - delete get_rteb_benchmark.py;RTEB_BENCHMARK_ENTRIES changes

* feat -format

* Update mteb/load_results/benchmark_results.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

---------

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Automatically generated by python-semantic-release
* chore: add 'Patent retrieval' subtype to TaskMetadata

* feat(retrieval): add DAPFAM patent retrieval tasks (+18 variants)

* Dapfam patent retrieval PR #2946 : refactor DAPFAM tasks (explicit classes, license, metadata, custom definition explanation ...)

* Dapfam patent retrieval PR #2946 : refactor DAPFAM tasks (explicit classes, license, metadata, custom definition explanation ...)

* Update mteb/tasks/Retrieval/eng/DAPFAMPatentRetrieval.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

* Update mteb/tasks/Retrieval/eng/DAPFAMPatentRetrieval.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

* Update mteb/tasks/Retrieval/eng/DAPFAMPatentRetrieval.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

* Update mteb/tasks/Retrieval/eng/DAPFAMPatentRetrieval.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

* Update mteb/tasks/Retrieval/eng/DAPFAMPatentRetrieval.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

* Update mteb/tasks/Retrieval/eng/DAPFAMPatentRetrieval.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

* Changes :

- Added possibility to opt in or out of quantization through the "quantize" argument.
- Added possibility to compute raw dot product without normalization. (to reproduce the paper method the "similarity" argument should be "cosine").
- Removed unecessary function and overhauled the tasks descriptions to be more clear.

* Update mteb/tasks/Retrieval/eng/DAPFAMPatentRetrieval.py

* Update mteb/tasks/Retrieval/eng/DAPFAMPatentRetrieval.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

* Update mteb/tasks/Retrieval/eng/DAPFAMPatentRetrieval.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

* Update mteb/tasks/Retrieval/eng/DAPFAMPatentRetrieval.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

* Changes made :
- Overhauled task descriptions as well as naming to conform with the naming scheme of mteb retrieval tasks.
- Similarity is now computed using the similarity function of the passed model.
- Changed optional quantization method to conform with sentence transformers similarity function.

to reproduce the paper metrics, one can use the following snippet :

```python
from mteb import mteb
from sentence_transformers import SentenceTransformer

model_name = "Snowflake/snowflake-arctic-embed-m-v2.0"
model = SentenceTransformer(model_name,
                           model_kwargs={
                            "torch_dtype": "float16",
                            },
                           trust_remote_code=True,
                            ).cuda().eval()

tasks = mteb.get_tasks(tasks=[
    "DAPFAMInTitlAbsToTitlAbsClmRetrieval",
    "DAPFAMAllTitlAbsToTitlAbsClmRetrieval",
    "DAPFAMOutTitlAbsToTitlAbsClmRetrieval",
     add the other 3 remaining tasks ...
    ])

evaluation = mteb.MTEB(tasks=tasks)
results = evaluation.run(
		model,
		output_folder=f"mteb_res/{model_name}",
		quantize=True, # if set to false or not set, the obtained ndcg@10 and map@10 will be ~0.001 higher
		encode_kwargs= {"batch_size" : 32}
	)
```

* changed default value of quantization to false

* added the import to all DAPFAM tasks; tested that the  works; verified compliance with the checklist

* Update mteb/tasks/Retrieval/eng/DAPFAMPatentRetrieval.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* added revision numbers to all dataset loading operations as well as the metadata itself

* intermediate changes, refresh local branch

* intermediate changes, refresh local branch again

* scale back to standard evaluation with empty set exclusion, various cosmetic/formatting changes

* minor cosmetic/formatting changes

* fixed main metric to be ndcg_at_100 as in the paper

* removed old code artifacts from previous versions

* read appropriate loading arguments from task metadata, remove unecessary class attribute

* reformat bibtex ( remark on the assertion since it tries to match literal string instead of bibtex formatting, format inconsistent with arXiv default), fixed metadata, parameters read from task metadata, all tests passed

* refactor data loading to read from metadata class attributes

---------

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
#3168)

* Adding JapaneseCode1Retrieval as the first non-public dataset

* Transformed dataset

* Adding as private dataset to tests

* Correct the private task test

* Use the sample dataset as a reference

* Use the sample dataset as a reference

* fix ds loading

* allow on forks

* upd aciton

* remove paths

* try to trigger ci

* add ref

* add permissions

* remove paths

* add paths back

* get back to pull request

* rollback action

* Trying to resolve the token/secret problem

* Trying to resolve the token/secret problem

* Update dataset_loading_pr.yml

* Update dataset_loading_pr.yml

* Try the latest datasets package (worked for me)

* Try the latest datasets package (worked for me)

* Try the latest datasets package (worked for me)

* (last?) try

* (last?) try

* (last?) try

* Reverting the changes

* Exclude the private datasets from tests

* Apply suggestions from code review

---------

Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
Co-authored-by: Solomatin Roman <samoed.roman@gmail.com>
Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>
* Add 12 more closed datasets
Extend the RTEB benchmarks

* trust_remote_code

* trust_remote_code

* Enabling JapaneseCode1Retrieval in the RTEB benchmarks

* Add closed datasets as private tasks

* Correct due to the comment
* Update benchmark to version 2

* make others in benchmark selector one line code

* small changes

* update a few tasks metadata

* update faintent license with correct form

* remove redundant trust remote codes

* fix hardnegatives revision

* make lint

* fix errors

* apply suggestions

* fix citation problem

* add PR link to benchmark desc

* remove duplicate dataset names in mcinext_models

* update prompts

---------

Co-authored-by: mehran <mehan.sarmadi16@gmail.com>
Automatically generated by python-semantic-release
semantic-release and others added 3 commits September 18, 2025 14:03
Automatically generated by python-semantic-release
# Conflicts:
#	mteb/abstasks/TaskMetadata.py
#	mteb/custom_validators.py
#	mteb/leaderboard/app.py
#	mteb/leaderboard/text_segments.py
#	mteb/models/model_implementations/google_models.py
#	mteb/models/model_implementations/voyage_models.py
#	mteb/models/overview.py
#	mteb/tasks/Classification/fas/FaMTEBClassification.py
#	mteb/tasks/Retrieval/__init__.py
#	pyproject.toml
#	scripts/extract_model_names.py
#	tests/test_models/test_model_meta.py
@@ -0,0 +1,40 @@
name: Welcome New Contributors

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh didn't see this one - looking forward to seeing it in action

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly it's not working #3117

Comment thread docs/benchmarks.md

@KennethEnevoldsen KennethEnevoldsen Sep 22, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsure if we should delete these in v2

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can delete this, because we have site with all documentation

@KennethEnevoldsen KennethEnevoldsen left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good no particular worries

@KennethEnevoldsen KennethEnevoldsen merged commit 7c46163 into v2.0.0 Sep 22, 2025
10 checks passed
@KennethEnevoldsen KennethEnevoldsen deleted the merge_main_20_09 branch September 22, 2025 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.