[v2] Merge main 20 09#3193
Merged
Merged
Conversation
* align task prompt dict with `PromptType` * use value instead of enum
…3090) * Add ModelMeta for OrdalieTech/Solon-embeddings-mini-beta-1.1 * Add training_datasets (common_corpus, fineweb, wiki_fr, private LLM-synth) * Format with ruff + add loader per review * Apply ruff format/fixes * Update mteb/models/ordalietech_solon_embeddings_mini_beta_1_1.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/models/ordalietech_solon_embeddings_mini_beta_1_1.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Register OrdalieTech/Solon-embeddings-mini-beta-1.1 in overview (ModelMeta + loader) * Update mteb/models/ordalietech_solon_embeddings_mini_beta_1_1.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * fix import * Add memory_usage_mb=808.0 and required fields to ModelMeta * Fix 210 milions of parameters --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>
* - Added an include_private parameter to the get_tasks() function that defaults to False - This ensures that by default, tests only run on public datasets - Tests can explicitly set include_private=True when needed to test private datasets - Added is_public: bool | None = None field to TaskMetadata - The field is optional and defaults to None (treated as public) - Updated the is_filled() method to exclude is_public from required fields - Added documentation * - Added an include_private parameter to the get_tasks() function that defaults to False - This ensures that by default, tests only run on public datasets - Tests can explicitly set include_private=True when needed to test private datasets - Added is_public: bool | None = None field to TaskMetadata - The field is optional and defaults to None (treated as public) - Updated the is_filled() method to exclude is_public from required fields - Added documentation * Correcting due to comments * Update mteb/abstasks/TaskMetadata.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Update mteb/overview.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Removing the not used filter_tasks_by_privacy function * Correcting due to comments * Correcting due to comments * Correcting due to comments * Removing the test case * Rename the include_private parameter to exclude_private * Rename the include_private parameter to exclude_private * Add private tasks tests * Add private tasks tests * Update tests/test_tasks/test_private_tasks.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Add private tasks tests * Add private tasks tests * Add private tasks tests --------- Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>
test out GH models with welcoming new comers
* add dataset check on new PR * add extract datasets * run as module * update startswith * update workflow name * add GitPython * export var * same shell session * address review comments * add to docs to say what this script does * add docs
* add youtu models * add a blank line * fix the optional dependencies and lint the code * remove unused dependencies and reformat * revise prompt_type --------- Co-authored-by: springxchen <springxchen@tencent.com>
* Adding quantization support * Update mteb/models/voyage_models.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Update mteb/model_meta.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Update mteb/model_meta.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * Simplifying the quantization/output_dtype * Update mteb/model_meta.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> --------- Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
* model: EmbeddingGemma 300M * Add license and revision
* feat - remove special filtering, keep zero-shot, keep borda rank * feat - remove get_rteb_benchmark.py * feat - delete get_rteb_benchmark.py;RTEB_BENCHMARK_ENTRIES changes * feat -format * Update mteb/load_results/benchmark_results.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> --------- Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
* chore: add 'Patent retrieval' subtype to TaskMetadata * feat(retrieval): add DAPFAM patent retrieval tasks (+18 variants) * Dapfam patent retrieval PR #2946 : refactor DAPFAM tasks (explicit classes, license, metadata, custom definition explanation ...) * Dapfam patent retrieval PR #2946 : refactor DAPFAM tasks (explicit classes, license, metadata, custom definition explanation ...) * Update mteb/tasks/Retrieval/eng/DAPFAMPatentRetrieval.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Update mteb/tasks/Retrieval/eng/DAPFAMPatentRetrieval.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Update mteb/tasks/Retrieval/eng/DAPFAMPatentRetrieval.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Update mteb/tasks/Retrieval/eng/DAPFAMPatentRetrieval.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Update mteb/tasks/Retrieval/eng/DAPFAMPatentRetrieval.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Update mteb/tasks/Retrieval/eng/DAPFAMPatentRetrieval.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Changes : - Added possibility to opt in or out of quantization through the "quantize" argument. - Added possibility to compute raw dot product without normalization. (to reproduce the paper method the "similarity" argument should be "cosine"). - Removed unecessary function and overhauled the tasks descriptions to be more clear. * Update mteb/tasks/Retrieval/eng/DAPFAMPatentRetrieval.py * Update mteb/tasks/Retrieval/eng/DAPFAMPatentRetrieval.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Update mteb/tasks/Retrieval/eng/DAPFAMPatentRetrieval.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Update mteb/tasks/Retrieval/eng/DAPFAMPatentRetrieval.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Changes made : - Overhauled task descriptions as well as naming to conform with the naming scheme of mteb retrieval tasks. - Similarity is now computed using the similarity function of the passed model. - Changed optional quantization method to conform with sentence transformers similarity function. to reproduce the paper metrics, one can use the following snippet : ```python from mteb import mteb from sentence_transformers import SentenceTransformer model_name = "Snowflake/snowflake-arctic-embed-m-v2.0" model = SentenceTransformer(model_name, model_kwargs={ "torch_dtype": "float16", }, trust_remote_code=True, ).cuda().eval() tasks = mteb.get_tasks(tasks=[ "DAPFAMInTitlAbsToTitlAbsClmRetrieval", "DAPFAMAllTitlAbsToTitlAbsClmRetrieval", "DAPFAMOutTitlAbsToTitlAbsClmRetrieval", add the other 3 remaining tasks ... ]) evaluation = mteb.MTEB(tasks=tasks) results = evaluation.run( model, output_folder=f"mteb_res/{model_name}", quantize=True, # if set to false or not set, the obtained ndcg@10 and map@10 will be ~0.001 higher encode_kwargs= {"batch_size" : 32} ) ``` * changed default value of quantization to false * added the import to all DAPFAM tasks; tested that the works; verified compliance with the checklist * Update mteb/tasks/Retrieval/eng/DAPFAMPatentRetrieval.py Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> * added revision numbers to all dataset loading operations as well as the metadata itself * intermediate changes, refresh local branch * intermediate changes, refresh local branch again * scale back to standard evaluation with empty set exclusion, various cosmetic/formatting changes * minor cosmetic/formatting changes * fixed main metric to be ndcg_at_100 as in the paper * removed old code artifacts from previous versions * read appropriate loading arguments from task metadata, remove unecessary class attribute * reformat bibtex ( remark on the assertion since it tries to match literal string instead of bibtex formatting, format inconsistent with arXiv default), fixed metadata, parameters read from task metadata, all tests passed * refactor data loading to read from metadata class attributes --------- Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
…3185) Correct the batch creation
#3168) * Adding JapaneseCode1Retrieval as the first non-public dataset * Transformed dataset * Adding as private dataset to tests * Correct the private task test * Use the sample dataset as a reference * Use the sample dataset as a reference * fix ds loading * allow on forks * upd aciton * remove paths * try to trigger ci * add ref * add permissions * remove paths * add paths back * get back to pull request * rollback action * Trying to resolve the token/secret problem * Trying to resolve the token/secret problem * Update dataset_loading_pr.yml * Update dataset_loading_pr.yml * Try the latest datasets package (worked for me) * Try the latest datasets package (worked for me) * Try the latest datasets package (worked for me) * (last?) try * (last?) try * (last?) try * Reverting the changes * Exclude the private datasets from tests * Apply suggestions from code review --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> Co-authored-by: Solomatin Roman <samoed.roman@gmail.com> Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>
add version check
* Add 12 more closed datasets Extend the RTEB benchmarks * trust_remote_code * trust_remote_code * Enabling JapaneseCode1Retrieval in the RTEB benchmarks * Add closed datasets as private tasks * Correct due to the comment
* Update benchmark to version 2 * make others in benchmark selector one line code * small changes * update a few tasks metadata * update faintent license with correct form * remove redundant trust remote codes * fix hardnegatives revision * make lint * fix errors * apply suggestions * fix citation problem * add PR link to benchmark desc * remove duplicate dataset names in mcinext_models * update prompts --------- Co-authored-by: mehran <mehan.sarmadi16@gmail.com>
fix conflict dependencies
# Conflicts: # mteb/abstasks/TaskMetadata.py # mteb/custom_validators.py # mteb/leaderboard/app.py # mteb/leaderboard/text_segments.py # mteb/models/model_implementations/google_models.py # mteb/models/model_implementations/voyage_models.py # mteb/models/overview.py # mteb/tasks/Classification/fas/FaMTEBClassification.py # mteb/tasks/Retrieval/__init__.py # pyproject.toml # scripts/extract_model_names.py # tests/test_models/test_model_meta.py
| @@ -0,0 +1,40 @@ | |||
| name: Welcome New Contributors | |||
Contributor
There was a problem hiding this comment.
Ahh didn't see this one - looking forward to seeing it in action
Contributor
There was a problem hiding this comment.
unsure if we should delete these in v2
Member
Author
There was a problem hiding this comment.
I think we can delete this, because we have site with all documentation
KennethEnevoldsen
approved these changes
Sep 22, 2025
KennethEnevoldsen
left a comment
Contributor
There was a problem hiding this comment.
Looks good no particular worries
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.