Qzhou embedding results by PennyYu123 · Pull Request #250 · embeddings-benchmark/results

PennyYu123 · 2025-08-03T04:13:18Z

We have released the HF model publicly and resubmitted the mteb implementation.

Checklist

My model has a model sheet, report or similar
My model has a reference implementation in mteb/models/ this can be as an API. Instruction on how to add a model can be found here
- No, but there is an existing PR ___
The results submitted is obtained using the reference implementation
[ *] My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
[ *] I solemnly swear that for all results submitted I have not on the evaluation dataset including training splits. If I have I have disclosed it clearly.

github-actions · 2025-08-03T04:19:19Z

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: Kingsoft-LLM/QZhou-Embedding
Tasks: AFQMC, ATEC, AmazonCounterfactualClassification, ArXivHierarchicalClusteringP2P, ArXivHierarchicalClusteringS2S, ArguAna, AskUbuntuDupQuestions, BIOSSES, BQ, Banking77Classification, BiorxivClusteringP2P.v2, CLSClusteringP2P, CLSClusteringS2S, CMedQAv1-reranking, CMedQAv2-reranking, CQADupstackGamingRetrieval, CQADupstackUnixRetrieval, ClimateFEVERHardNegatives, CmedqaRetrieval, Cmnli, CovidRetrieval, DuRetrieval, EcomRetrieval, FEVERHardNegatives, FiQA2018, HotpotQAHardNegatives, IFlyTek, ImdbClassification, JDReview, LCQMC, MMarcoReranking, MMarcoRetrieval, MTOPDomainClassification, MassiveIntentClassification, MassiveScenarioClassification, MedicalRetrieval, MedrxivClusteringP2P.v2, MedrxivClusteringS2S.v2, MindSmallReranking, MultilingualSentiment, Ocnli, OnlineShopping, PAWSX, QBQTC, SCIDOCS, SICK-R, STS12, STS13, STS14, STS15, STS17, STS22.v2, STSB, STSBenchmark, SprintDuplicateQuestions, StackExchangeClustering.v2, StackExchangeClusteringP2P.v2, SummEvalSummarization.v2, T2Reranking, T2Retrieval, TNews, TRECCOVID, ThuNewsClusteringP2P, ThuNewsClusteringS2S, Touche2020Retrieval.v3, ToxicConversationsClassification, TweetSentimentExtractionClassification, TwentyNewsgroupsClustering.v2, TwitterSemEval2015, TwitterURLCorpus, VideoRetrieval, Waimai

Results for `Kingsoft-LLM/QZhou-Embedding`

task_name	Kingsoft-LLM/QZhou-Embedding	google/gemini-embedding-001	intfloat/multilingual-e5-large	Max result
AFQMC	0.67	nan	0.33	0.72
ATEC	0.55	nan	0.4	0.65
AmazonCounterfactualClassification	0.93	0.88	0.7	0.97
ArXivHierarchicalClusteringP2P	0.66	0.65	0.56	0.69
ArXivHierarchicalClusteringS2S	0.64	0.64	0.54	0.65
ArguAna	0.84	0.86	0.54	0.90
AskUbuntuDupQuestions	0.69	0.64	0.59	0.70
BIOSSES	0.93	0.89	0.85	0.97
BQ	0.77	nan	0.48	0.81
Banking77Classification	0.85	0.94	0.75	0.94
BiorxivClusteringP2P.v2	0.54	0.54	0.37	0.56
CLSClusteringP2P	0.65	nan	nan	0.82
CLSClusteringS2S	0.61	nan	nan	0.74
CMedQAv1-reranking	0.94	nan	0.68	0.94
CMedQAv2-reranking	0.94	nan	0.67	0.94
CQADupstackGamingRetrieval	0.76	0.71	0.59	0.79
CQADupstackUnixRetrieval	0.71	0.54	0.4	0.72
ClimateFEVERHardNegatives	0.49	0.31	0.26	0.49
CmedqaRetrieval	0.52	nan	0.29	0.57
Cmnli	0.95	nan	nan	0.95
CovidRetrieval	0.93	0.79	0.76	0.96
DuRetrieval	0.92	nan	0.85	0.94
EcomRetrieval	0.77	nan	0.55	0.78
FEVERHardNegatives	0.94	0.89	0.84	0.95
FiQA2018	0.60	0.62	0.44	0.80
HotpotQAHardNegatives	0.81	0.87	0.71	0.87
IFlyTek	0.58	nan	0.42	0.58
ImdbClassification	0.96	0.95	0.89	0.97
JDReview	0.88	nan	0.81	0.92
LCQMC	0.82	nan	0.76	0.82
MMarcoReranking	0.44	nan	0.29	0.47
MMarcoRetrieval	0.83	nan	0.79	0.90
MTOPDomainClassification	0.96	0.98	0.9	1.00
MassiveIntentClassification	0.55	0.82	0.6	0.92
MassiveScenarioClassification	0.74	0.87	0.7	0.99
MedicalRetrieval	0.73	nan	0.51	0.76
MedrxivClusteringP2P.v2	0.50	0.47	0.34	0.52
MedrxivClusteringS2S.v2	0.48	0.45	0.32	0.51
MindSmallReranking	0.34	0.33	0.3	0.34
MultilingualSentiment	0.85	nan	0.71	0.85
Ocnli	0.95	nan	nan	0.95
OnlineShopping	0.96	nan	0.9	0.97
PAWSX	0.70	nan	0.15	0.70
QBQTC	0.60	nan	nan	0.71
SCIDOCS	0.29	0.25	0.17	0.35
SICK-R	0.88	0.83	0.8	0.95
STS12	0.90	0.82	0.8	0.95
STS13	0.96	0.90	0.82	0.98
STS14	0.93	0.85	0.78	0.98
STS15	0.95	0.90	0.89	0.98
STS17	0.89	0.89	0.82	0.93
STS22.v2	0.77	0.72	0.64	0.77
STSB	0.92	0.85	0.82	0.92
STSBenchmark	0.95	0.89	0.87	0.95
SprintDuplicateQuestions	0.98	0.97	0.93	0.98
StackExchangeClustering.v2	0.76	0.92	0.46	0.92
StackExchangeClusteringP2P.v2	0.55	0.51	0.39	0.55
SummEvalSummarization.v2	0.33	0.38	0.31	0.39
T2Reranking	0.68	0.68	0.66	0.73
T2Retrieval	0.82	nan	0.76	0.89
TNews	0.61	nan	0.49	0.61
TRECCOVID	0.78	0.86	0.71	0.95
ThuNewsClusteringP2P	0.82	nan	nan	0.89
ThuNewsClusteringS2S	0.76	nan	nan	0.88
Touche2020Retrieval.v3	0.50	0.52	0.5	0.75
ToxicConversationsClassification	0.90	0.89	0.66	0.98
TweetSentimentExtractionClassification	0.77	0.70	0.63	0.88
TwentyNewsgroupsClustering.v2	0.81	0.57	0.39	0.88
TwitterSemEval2015	0.87	0.79	0.75	0.89
TwitterURLCorpus	0.92	0.87	0.86	0.96
VideoRetrieval	0.79	nan	0.58	0.84
Waimai	0.92	nan	0.86	0.92
Average	0.76	0.73	0.61	0.81

correct model_meta info

PennyYu123 · 2025-08-04T04:44:42Z

past PR: #249

PennyYu123

completed

KennethEnevoldsen · 2025-08-07T14:20:45Z

We are still waiting for the model PR to merge :)

KennethEnevoldsen · 2025-08-09T14:02:18Z

Thanks! I have looked over the scores, and a few seem suspiciously high:

AmazonCounterfactualClassification
AskUbuntuDupQuestions
BQ
Waimai
TNews
IFlyTek
...

However, it seems like these are not in the annotated training data:

import mteb
meta = mteb.get_model_meta("Kingsoft-LLM/QZhou-Embedding")

# in training data
"AmazonCounterfactualClassification" in meta.training_datasets # True

# not in:
"AskUbuntuDupQuestions" in meta.training_datasets # False
"BQ" in meta.training_datasets # False
"Waimai" in meta.training_datasets # False
"TNews" in meta.training_datasets # False
"IFlyTek" in meta.training_datasets # False

@PennyYu123 can you help me figure out these scores? Could you have missed some annotations or synthetically generated matching training data?

Samoed · 2025-08-10T15:54:06Z

If you allow, I can open another PR to mteb to add the missing training sets.

Yes, it would great if you'd add them to training datasets

Samoed · 2025-08-11T12:41:27Z

You can add your new scores in new subfolder with your new revision

PennyYu123 · 2025-08-14T13:58:59Z

Hello, our new model results have been uploaded. We have already submitted a PR to mteb repo. We have also replaced the original model parameter file with our new one in huggingface. Let's continue the previous process. 😊😊😊

KennethEnevoldsen · 2025-08-16T13:30:04Z

Hi @PennyYu123,

I have merged the PR, but it seems like there are still some datasets missing from the list that you provided:

import mteb
meta = mteb.get_model_meta("Kingsoft-LLM/QZhou-Embedding")
"AmazonCounterfactualClassification" in meta.training_datasets # True
"AskUbuntuDupQuestions" in meta.training_datasets # False
"BQ" in meta.training_datasets # False
"Waimai" in meta.training_datasets # True (fixed)
"TNews" in meta.training_datasets # False (fixed)
"IFlyTek" in meta.training_datasets # False
# do also check the remainder of the list

Can I ask you to update the training datasets again?

KennethEnevoldsen · 2025-08-16T18:03:50Z

we restructured our training plan and trained a new model last week

Ahh, great, I will rerun the table to see if there are any remaining concerns.

Oh, I have another request. We have an important presentation on Monday, and we might need our results to be on the leaderboard. Can you get it done in the next few days

I am back from holiday, so that should be possible. Sorry that you had to wait due to the holiday; normally, it takes no more than 1-2 days.

If you can help us complete it, we'd like to contribute more to our open source community. I'm also engaged in retrieval model research, and we may collaborate in the future.

We, of course, always appreciate collaboration and contributions, but let us keep that out of the review process :)

Samoed · 2025-08-18T14:44:13Z

@KennethEnevoldsen

KennethEnevoldsen

Ahh! forgot to press submit on the review...

I have added the updated table below. There are still a few that seem concerning:

TwitterSemEval2015
SCIDOCS
AskUbuntuDupQuestions

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: Kingsoft-LLM/QZhou-Embedding
Tasks: AFQMC, ATEC, AmazonCounterfactualClassification, ArXivHierarchicalClusteringP2P, ArXivHierarchicalClusteringS2S, ArguAna, AskUbuntuDupQuestions, BIOSSES, BQ, Banking77Classification, BiorxivClusteringP2P.v2, CLSClusteringP2P, CLSClusteringS2S, CMedQAv1-reranking, CMedQAv2-reranking, CQADupstackGamingRetrieval, CQADupstackUnixRetrieval, ClimateFEVERHardNegatives, CmedqaRetrieval, Cmnli, CovidRetrieval, DuRetrieval, EcomRetrieval, FEVERHardNegatives, FiQA2018, HotpotQAHardNegatives, IFlyTek, ImdbClassification, JDReview, LCQMC, MMarcoReranking, MMarcoRetrieval, MTOPDomainClassification, MassiveIntentClassification, MassiveScenarioClassification, MedicalRetrieval, MedrxivClusteringP2P.v2, MedrxivClusteringS2S.v2, MindSmallReranking, MultilingualSentiment, Ocnli, OnlineShopping, PAWSX, QBQTC, SCIDOCS, SICK-R, STS12, STS13, STS14, STS15, STS17, STS22.v2, STSB, STSBenchmark, SprintDuplicateQuestions, StackExchangeClustering.v2, StackExchangeClusteringP2P.v2, SummEvalSummarization.v2, T2Reranking, T2Retrieval, TNews, TRECCOVID, ThuNewsClusteringP2P, ThuNewsClusteringS2S, Touche2020Retrieval.v3, ToxicConversationsClassification, TweetSentimentExtractionClassification, TwentyNewsgroupsClustering.v2, TwitterSemEval2015, TwitterURLCorpus, VideoRetrieval, Waimai

Results for `Kingsoft-LLM/QZhou-Embedding`

task_name	Kingsoft-LLM/QZhou-Embedding	google/gemini-embedding-001	intfloat/multilingual-e5-large	Max result
AFQMC	0.66	nan	0.33	0.72
ATEC	0.55	nan	0.4	0.65
AmazonCounterfactualClassification	0.93	0.88	0.7	0.97
ArXivHierarchicalClusteringP2P	0.66	0.65	0.56	0.69
ArXivHierarchicalClusteringS2S	0.64	0.64	0.54	0.65
ArguAna	0.84	0.86	0.54	0.90
AskUbuntuDupQuestions	0.75	0.64	0.59	0.75
BIOSSES	0.93	0.89	0.85	0.97
BQ	0.77	nan	0.48	0.81
Banking77Classification	0.85	0.94	0.75	0.94
BiorxivClusteringP2P.v2	0.55	0.54	0.37	0.56
CLSClusteringP2P	0.67	nan	nan	0.82
CLSClusteringS2S	0.61	nan	nan	0.74
CMedQAv1-reranking	0.94	nan	0.68	0.94
CMedQAv2-reranking	0.93	nan	0.67	0.93
CQADupstackGamingRetrieval	0.77	0.71	0.59	0.79
CQADupstackUnixRetrieval	0.70	0.54	0.4	0.72
ClimateFEVERHardNegatives	0.62	0.31	0.26	0.62
CmedqaRetrieval	0.51	nan	0.29	0.57
Cmnli	0.95	nan	nan	0.95
CovidRetrieval	0.93	0.79	0.76	0.96
DuRetrieval	0.92	nan	0.85	0.94
EcomRetrieval	0.77	nan	0.55	0.78
FEVERHardNegatives	0.94	0.89	0.84	0.95
FiQA2018	0.60	0.62	0.44	0.80
HotpotQAHardNegatives	0.80	0.87	0.71	0.87
IFlyTek	0.57	nan	0.42	0.58
ImdbClassification	0.96	0.95	0.89	0.97
JDReview	0.90	nan	0.81	0.92
LCQMC	0.82	nan	0.76	0.82
MMarcoReranking	0.51	nan	0.29	0.51
MMarcoRetrieval	0.83	nan	0.79	0.90
MTOPDomainClassification	0.96	0.98	0.9	1.00
MassiveIntentClassification	0.55	0.82	0.6	0.92
MassiveScenarioClassification	0.73	0.87	0.7	0.99
MedicalRetrieval	0.72	nan	0.51	0.76
MedrxivClusteringP2P.v2	0.51	0.47	0.34	0.52
MedrxivClusteringS2S.v2	0.48	0.45	0.32	0.51
MindSmallReranking	0.36	0.33	0.3	0.36
MultilingualSentiment	0.85	nan	0.71	0.85
Ocnli	0.95	nan	nan	0.95
OnlineShopping	0.96	nan	0.9	0.97
PAWSX	0.70	nan	0.15	0.70
QBQTC	0.61	nan	nan	0.71
SCIDOCS	0.44	0.25	0.17	0.44
SICK-R	0.88	0.83	0.8	0.95
STS12	0.90	0.82	0.8	0.95
STS13	0.95	0.90	0.82	0.98
STS14	0.93	0.85	0.78	0.98
STS15	0.96	0.90	0.89	0.98
STS17	0.90	0.89	0.82	0.93
STS22.v2	0.78	0.72	0.64	0.78
STSB	0.92	0.85	0.82	0.92
STSBenchmark	0.96	0.89	0.87	0.96
SprintDuplicateQuestions	0.98	0.97	0.93	0.98
StackExchangeClustering.v2	0.76	0.92	0.46	0.92
StackExchangeClusteringP2P.v2	0.55	0.51	0.39	0.55
SummEvalSummarization.v2	0.34	0.38	0.31	0.39
T2Reranking	0.68	0.68	0.66	0.73
T2Retrieval	0.81	nan	0.76	0.89
TNews	0.60	nan	0.49	0.60
TRECCOVID	0.79	0.86	0.71	0.95
ThuNewsClusteringP2P	0.83	nan	nan	0.89
ThuNewsClusteringS2S	0.78	nan	nan	0.88
Touche2020Retrieval.v3	0.49	0.52	0.5	0.75
ToxicConversationsClassification	0.90	0.89	0.66	0.98
TweetSentimentExtractionClassification	0.77	0.70	0.63	0.88
TwentyNewsgroupsClustering.v2	0.82	0.57	0.39	0.88
TwitterSemEval2015	0.92	0.79	0.75	0.92
TwitterURLCorpus	0.92	0.87	0.86	0.96
VideoRetrieval	0.80	nan	0.58	0.84
Waimai	0.92	nan	0.86	0.92
Average	0.76	0.73	0.61	0.81

KennethEnevoldsen · 2025-08-22T08:57:06Z

@PennyYu123 can you help me understand the few concerning datasets? Might there be missing dataset annotations?

PennyYu123 · 2025-08-23T17:47:15Z

We have concurrently updated the following components:
1.Hugging Face model parameters - Full refresh of embedding weights
2.MTEB model_meta - Updated release_date and revision
All upgrades are now live and operational.

KennethEnevoldsen · 2025-08-25T13:42:52Z

PR that updates model revision embeddings-benchmark/mteb#3069

I will recompute the table

KennethEnevoldsen · 2025-08-26T09:26:27Z

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: Kingsoft-LLM/QZhou-Embedding
Tasks: AFQMC, ATEC, AmazonCounterfactualClassification, ArXivHierarchicalClusteringP2P, ArXivHierarchicalClusteringS2S, ArguAna, AskUbuntuDupQuestions, BIOSSES, BQ, Banking77Classification, BiorxivClusteringP2P.v2, CLSClusteringP2P, CLSClusteringS2S, CMedQAv1-reranking, CMedQAv2-reranking, CQADupstackGamingRetrieval, CQADupstackUnixRetrieval, ClimateFEVERHardNegatives, CmedqaRetrieval, Cmnli, CovidRetrieval, DuRetrieval, EcomRetrieval, FEVERHardNegatives, FiQA2018, HotpotQAHardNegatives, IFlyTek, ImdbClassification, JDReview, LCQMC, MMarcoReranking, MMarcoRetrieval, MTOPDomainClassification, MassiveIntentClassification, MassiveScenarioClassification, MedicalRetrieval, MedrxivClusteringP2P.v2, MedrxivClusteringS2S.v2, MindSmallReranking, MultilingualSentiment, Ocnli, OnlineShopping, PAWSX, QBQTC, SCIDOCS, SICK-R, STS12, STS13, STS14, STS15, STS17, STS22.v2, STSB, STSBenchmark, SprintDuplicateQuestions, StackExchangeClustering.v2, StackExchangeClusteringP2P.v2, SummEvalSummarization.v2, T2Reranking, T2Retrieval, TNews, TRECCOVID, ThuNewsClusteringP2P, ThuNewsClusteringS2S, Touche2020Retrieval.v3, ToxicConversationsClassification, TweetSentimentExtractionClassification, TwentyNewsgroupsClustering.v2, TwitterSemEval2015, TwitterURLCorpus, VideoRetrieval, Waimai

Results for `Kingsoft-LLM/QZhou-Embedding`

task_name	Kingsoft-LLM/QZhou-Embedding	google/gemini-embedding-001	intfloat/multilingual-e5-large	Max result
AFQMC	0.67	nan	0.33	0.72
ATEC	0.55	nan	0.4	0.65
AmazonCounterfactualClassification	0.93	0.88	0.7	0.97
ArXivHierarchicalClusteringP2P	0.66	0.65	0.56	0.69
ArXivHierarchicalClusteringS2S	0.64	0.64	0.54	0.65
ArguAna	0.84	0.86	0.54	0.90
AskUbuntuDupQuestions	0.69	0.64	0.59	0.70
BIOSSES	0.93	0.89	0.85	0.97
BQ	0.77	nan	0.48	0.81
Banking77Classification	0.85	0.94	0.75	0.94
BiorxivClusteringP2P.v2	0.54	0.54	0.37	0.56
CLSClusteringP2P	0.65	nan	nan	0.82
CLSClusteringS2S	0.61	nan	nan	0.74
CMedQAv1-reranking	0.94	nan	0.68	0.94
CMedQAv2-reranking	0.94	nan	0.67	0.94
CQADupstackGamingRetrieval	0.76	0.71	0.59	0.79
CQADupstackUnixRetrieval	0.71	0.54	0.4	0.72
ClimateFEVERHardNegatives	0.49	0.31	0.26	0.49
CmedqaRetrieval	0.52	nan	0.29	0.57
Cmnli	0.95	nan	nan	0.95
CovidRetrieval	0.93	0.79	0.76	0.96
DuRetrieval	0.92	nan	0.85	0.94
EcomRetrieval	0.77	nan	0.55	0.78
FEVERHardNegatives	0.94	0.89	0.84	0.95
FiQA2018	0.60	0.62	0.44	0.80
HotpotQAHardNegatives	0.81	0.87	0.71	0.87
IFlyTek	0.58	nan	0.42	0.58
ImdbClassification	0.96	0.95	0.89	0.97
JDReview	0.88	nan	0.81	0.92
LCQMC	0.82	nan	0.76	0.82
MMarcoReranking	0.44	nan	0.29	0.47
MMarcoRetrieval	0.83	nan	0.79	0.90
MTOPDomainClassification	0.96	0.98	0.9	1.00
MassiveIntentClassification	0.55	0.82	0.6	0.92
MassiveScenarioClassification	0.74	0.87	0.7	0.99
MedicalRetrieval	0.73	nan	0.51	0.76
MedrxivClusteringP2P.v2	0.50	0.47	0.34	0.52
MedrxivClusteringS2S.v2	0.48	0.45	0.32	0.51
MindSmallReranking	0.34	0.33	0.3	0.34
MultilingualSentiment	0.85	nan	0.71	0.85
Ocnli	0.95	nan	nan	0.95
OnlineShopping	0.96	nan	0.9	0.97
PAWSX	0.70	nan	0.15	0.70
QBQTC	0.60	nan	nan	0.71
SCIDOCS	0.29	0.25	0.17	0.35
SICK-R	0.88	0.83	0.8	0.95
STS12	0.90	0.82	0.8	0.95
STS13	0.96	0.90	0.82	0.98
STS14	0.93	0.85	0.78	0.98
STS15	0.95	0.90	0.89	0.98
STS17	0.89	0.89	0.82	0.93
STS22.v2	0.77	0.72	0.64	0.77
STSB	0.92	0.85	0.82	0.92
STSBenchmark	0.95	0.89	0.87	0.95
SprintDuplicateQuestions	0.98	0.97	0.93	0.98
StackExchangeClustering.v2	0.76	0.92	0.46	0.92
StackExchangeClusteringP2P.v2	0.55	0.51	0.39	0.55
SummEvalSummarization.v2	0.33	0.38	0.31	0.39
T2Reranking	0.68	0.68	0.66	0.73
T2Retrieval	0.82	nan	0.76	0.89
TNews	0.61	nan	0.49	0.61
TRECCOVID	0.78	0.86	0.71	0.95
ThuNewsClusteringP2P	0.82	nan	nan	0.89
ThuNewsClusteringS2S	0.76	nan	nan	0.88
Touche2020Retrieval.v3	0.50	0.52	0.5	0.75
ToxicConversationsClassification	0.90	0.89	0.66	0.98
TweetSentimentExtractionClassification	0.77	0.70	0.63	0.88
TwentyNewsgroupsClustering.v2	0.81	0.57	0.39	0.88
TwitterSemEval2015	0.87	0.79	0.75	0.89
TwitterURLCorpus	0.92	0.87	0.86	0.96
VideoRetrieval	0.79	nan	0.58	0.84
Waimai	0.92	nan	0.86	0.92
Average	0.76	0.73	0.61	0.81

KennethEnevoldsen · 2025-08-26T09:27:39Z

Alright, I think we finally got there! Congratulations again on the release :)

PennyYu123 added 2 commits August 3, 2025 12:01

add qzhou-embedding-result

d073538

add qzhou-embedding-result

9840a0c

KennethEnevoldsen added the waiting for review of implementation This PR is waiting for an implementation review before merging the results. label Aug 3, 2025

Update model_meta.json

abc901b

correct model_meta info

PennyYu123 commented Aug 7, 2025

View reviewed changes

Samoed mentioned this pull request Aug 10, 2025

don't add model results to max result #252

Merged

PennyYu123 mentioned this pull request Aug 11, 2025

Supplement missing training sets embeddings-benchmark/mteb#3023

Merged

upload new model results

54dddd9

KennethEnevoldsen removed the waiting for review of implementation This PR is waiting for an implementation review before merging the results. label Aug 16, 2025

KennethEnevoldsen approved these changes Aug 19, 2025

View reviewed changes

PennyYu123 requested a review from KennethEnevoldsen August 19, 2025 15:08

upload new scores

c10b131

KennethEnevoldsen merged commit 2369024 into embeddings-benchmark:main Aug 26, 2025
3 checks passed

PennyYu123 deleted the qzhou-embedding-results branch August 26, 2025 09:58

Uh oh!

Conversation

PennyYu123 commented Aug 3, 2025

Checklist

Uh oh!

github-actions Bot commented Aug 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Model Results Comparison

Results for Kingsoft-LLM/QZhou-Embedding

Uh oh!

PennyYu123 commented Aug 4, 2025

Uh oh!

PennyYu123 left a comment

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen commented Aug 7, 2025

Uh oh!

KennethEnevoldsen commented Aug 9, 2025

Uh oh!

Samoed commented Aug 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Samoed commented Aug 11, 2025

Uh oh!

PennyYu123 commented Aug 14, 2025

Uh oh!

KennethEnevoldsen commented Aug 16, 2025

Uh oh!

KennethEnevoldsen commented Aug 16, 2025

Uh oh!

Samoed commented Aug 18, 2025

Uh oh!

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

Model Results Comparison

Results for Kingsoft-LLM/QZhou-Embedding

Uh oh!

KennethEnevoldsen commented Aug 22, 2025

Uh oh!

PennyYu123 commented Aug 23, 2025

Uh oh!

KennethEnevoldsen commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KennethEnevoldsen commented Aug 26, 2025

Model Results Comparison

Results for Kingsoft-LLM/QZhou-Embedding

Uh oh!

KennethEnevoldsen commented Aug 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Aug 3, 2025 •

edited

Loading

Results for `Kingsoft-LLM/QZhou-Embedding`

Samoed commented Aug 10, 2025 •

edited

Loading

Results for `Kingsoft-LLM/QZhou-Embedding`

KennethEnevoldsen commented Aug 25, 2025 •

edited

Loading

Results for `Kingsoft-LLM/QZhou-Embedding`