qzhou-embedding results by PennyYu123 · Pull Request #249 · embeddings-benchmark/results

PennyYu123 · 2025-08-02T03:13:44Z

Checklist

My model has a model sheet, report or similar
My model has a reference implementation in mteb/models/ this can be as an API. Instruction on how to add a model can be found here
- No, but there is an existing PR ___
The results submitted is obtained using the reference implementation
My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
I solemnly swear that for all results submitted I have not on the evaluation dataset including training splits. If I have I have disclosed it clearly.

I have filled out the ModelMeta object to the extent possible
I have ensured that my model can be loaded using
mteb.get_model(model_name, revision) and
mteb.get_model_meta(model_name, revision)
I have tested the implementation works on a representative set of tasks.
The model is public, i.e. is available either as an API or the wieght are publicly avaiable to download

github-actions · 2025-08-02T03:19:53Z

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: Kingsoft-LLM/QZhou-Embedding
Tasks: AFQMC, ATEC, AmazonCounterfactualClassification, ArXivHierarchicalClusteringP2P, ArXivHierarchicalClusteringS2S, ArguAna, AskUbuntuDupQuestions, BIOSSES, BQ, Banking77Classification, BiorxivClusteringP2P.v2, CLSClusteringP2P, CLSClusteringS2S, CMedQAv1-reranking, CMedQAv2-reranking, CQADupstackGamingRetrieval, CQADupstackUnixRetrieval, ClimateFEVERHardNegatives, CmedqaRetrieval, Cmnli, CovidRetrieval, DuRetrieval, EcomRetrieval, FEVERHardNegatives, FiQA2018, HotpotQAHardNegatives, IFlyTek, ImdbClassification, JDReview, LCQMC, MMarcoReranking, MMarcoRetrieval, MTOPDomainClassification, MassiveIntentClassification, MassiveScenarioClassification, MedicalRetrieval, MedrxivClusteringP2P, MedrxivClusteringS2S, MindSmallReranking, MultilingualSentiment, Ocnli, OnlineShopping, PAWSX, QBQTC, SCIDOCS, SICK-R, STS12, STS13, STS14, STS15, STS17, STS22.v2, STSB, STSBenchmark, SprintDuplicateQuestions, StackExchangeClustering.v2, StackExchangeClusteringP2P.v2, SummEvalSummarization.v2, T2Reranking, T2Retrieval, TNews, TRECCOVID, ThuNewsClusteringP2P, ThuNewsClusteringS2S, Touche2020Retrieval.v3, ToxicConversationsClassification, TweetSentimentExtractionClassification, TwentyNewsgroupsClustering.v2, TwitterSemEval2015, TwitterURLCorpus, VideoRetrieval, Waimai

Results for `Kingsoft-LLM/QZhou-Embedding`

task_name	Kingsoft-LLM/QZhou-Embedding	google/gemini-embedding-001	intfloat/multilingual-e5-large	Max result
AFQMC	0.72	nan	0.33	0.72
ATEC	0.58	nan	0.4	0.65
AmazonCounterfactualClassification	0.97	0.88	0.7	0.97
ArXivHierarchicalClusteringP2P	0.65	0.65	0.56	0.69
ArXivHierarchicalClusteringS2S	0.63	0.64	0.54	0.65
ArguAna	0.77	0.86	0.54	0.90
AskUbuntuDupQuestions	0.77	0.64	0.59	0.77
BIOSSES	0.97	0.89	0.85	0.97
BQ	0.84	nan	0.48	0.84
Banking77Classification	0.83	0.94	0.75	0.94
BiorxivClusteringP2P.v2	0.70	0.54	0.37	0.70
CLSClusteringP2P	0.78	nan	nan	0.82
CLSClusteringS2S	0.76	nan	nan	0.76
CMedQAv1-reranking	0.95	nan	0.68	0.95
CMedQAv2-reranking	0.95	nan	0.67	0.95
CQADupstackGamingRetrieval	0.75	0.71	0.59	0.79
CQADupstackUnixRetrieval	0.70	0.54	0.4	0.72
ClimateFEVERHardNegatives	0.52	0.31	0.26	0.52
CmedqaRetrieval	0.52	nan	0.29	0.57
Cmnli	0.94	nan	nan	0.94
CovidRetrieval	0.93	0.79	0.76	0.96
DuRetrieval	0.82	nan	0.85	0.94
EcomRetrieval	0.75	nan	0.55	0.78
FEVERHardNegatives	0.83	0.89	0.84	0.95
FiQA2018	0.63	0.62	0.44	0.80
HotpotQAHardNegatives	0.72	0.87	0.71	0.87
IFlyTek	0.72	nan	0.42	0.72
ImdbClassification	0.99	0.95	0.89	0.99
JDReview	0.95	nan	0.81	0.95
LCQMC	0.82	nan	0.76	0.82
MMarcoReranking	0.28	nan	0.29	0.47
MMarcoRetrieval	0.77	nan	0.79	0.90
MTOPDomainClassification	0.99	0.98	0.9	1.00
MassiveIntentClassification	0.74	0.82	0.6	0.92
MassiveScenarioClassification	0.88	0.87	0.7	0.99
MedicalRetrieval	0.73	nan	0.51	0.76
MedrxivClusteringP2P	0.74	nan	0.32	0.74
MedrxivClusteringS2S	0.72	nan	0.3	0.72
MindSmallReranking	0.35	0.33	0.3	0.35
MultilingualSentiment	0.97	nan	0.71	0.97
Ocnli	0.96	nan	nan	0.96
OnlineShopping	0.99	nan	0.9	0.99
PAWSX	0.80	nan	0.15	0.80
QBQTC	0.58	nan	nan	0.71
SCIDOCS	0.54	0.25	0.17	0.54
SICK-R	0.90	0.83	0.8	0.95
STS12	0.92	0.82	0.8	0.95
STS13	0.96	0.90	0.82	0.98
STS14	0.93	0.85	0.78	0.98
STS15	0.95	0.90	0.89	0.98
STS17	0.91	0.89	0.82	0.93
STS22.v2	0.87	0.72	0.64	0.87
STSB	0.96	0.85	0.82	0.96
STSBenchmark	0.96	0.89	0.87	0.96
SprintDuplicateQuestions	0.99	0.97	0.93	0.99
StackExchangeClustering.v2	0.82	0.92	0.46	0.92
StackExchangeClusteringP2P.v2	0.57	0.51	0.39	0.57
SummEvalSummarization.v2	0.79	0.38	0.31	0.79
T2Reranking	0.68	0.68	0.66	0.73
T2Retrieval	0.79	nan	0.76	0.89
TNews	0.88	nan	0.49	0.88
TRECCOVID	0.76	0.86	0.71	0.95
ThuNewsClusteringP2P	0.91	nan	nan	0.91
ThuNewsClusteringS2S	0.88	nan	nan	0.88
Touche2020Retrieval.v3	0.55	0.52	0.5	0.75
ToxicConversationsClassification	0.97	0.89	0.66	0.98
TweetSentimentExtractionClassification	0.94	0.70	0.63	0.94
TwentyNewsgroupsClustering.v2	0.91	0.57	0.39	0.91
TwitterSemEval2015	0.92	0.79	0.75	0.92
TwitterURLCorpus	0.93	0.87	0.86	0.96
VideoRetrieval	0.72	nan	0.58	0.84
Waimai	0.98	nan	0.86	0.98
Average	0.80	0.75	0.61	0.84

KennethEnevoldsen · 2025-08-02T12:43:52Z

Hi @PennyYu123, I can neither find this model on Huggingface nor an implementation within the mteb package

Samoed · 2025-08-02T12:45:30Z

@KennethEnevoldsen It was added in this PR embeddings-benchmark/mteb#2965, but I can't find it on HF too

KennethEnevoldsen · 2025-08-02T17:24:22Z

Given that the model submission was closed, I will also close this. Feel free to re-open if you resubmit the model

PennyYu123 · 2025-08-02T17:24:58Z

I have corrected the wrong info in model_meta in the latest mteb PR and updated the model implementation code. Our Qingzhou model is now private and will be made public on HF tomorrow.

KennethEnevoldsen · 2025-08-02T18:05:51Z

Ah, that is great. Do feel free to reopen the PR once the implementation has been reviewed

qzhou-embedding results

6611eda

KennethEnevoldsen added the waiting for review of implementation This PR is waiting for an implementation review before merging the results. label Aug 2, 2025

KennethEnevoldsen closed this Aug 2, 2025

PennyYu123 deleted the qzhou-embedding-results branch August 3, 2025 03:01

PennyYu123 mentioned this pull request Aug 4, 2025

Qzhou embedding results #250

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

qzhou-embedding results#249

qzhou-embedding results#249
PennyYu123 wants to merge 1 commit into
embeddings-benchmark:mainfrom
PennyYu123:qzhou-embedding-results

PennyYu123 commented Aug 2, 2025

Uh oh!

github-actions Bot commented Aug 2, 2025

Uh oh!

KennethEnevoldsen commented Aug 2, 2025

Uh oh!

Samoed commented Aug 2, 2025 •

edited

Loading

Uh oh!

KennethEnevoldsen commented Aug 2, 2025

Uh oh!

PennyYu123 commented Aug 2, 2025

Uh oh!

KennethEnevoldsen commented Aug 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

PennyYu123 commented Aug 2, 2025

Checklist

Uh oh!

github-actions Bot commented Aug 2, 2025

Model Results Comparison

Results for Kingsoft-LLM/QZhou-Embedding

Uh oh!

KennethEnevoldsen commented Aug 2, 2025

Uh oh!

Samoed commented Aug 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KennethEnevoldsen commented Aug 2, 2025

Uh oh!

PennyYu123 commented Aug 2, 2025

Uh oh!

KennethEnevoldsen commented Aug 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Results for `Kingsoft-LLM/QZhou-Embedding`

Samoed commented Aug 2, 2025 •

edited

Loading