Skip to content

qzhou-embedding results#249

Closed
PennyYu123 wants to merge 1 commit into
embeddings-benchmark:mainfrom
PennyYu123:qzhou-embedding-results
Closed

qzhou-embedding results#249
PennyYu123 wants to merge 1 commit into
embeddings-benchmark:mainfrom
PennyYu123:qzhou-embedding-results

Conversation

@PennyYu123

Copy link
Copy Markdown
Contributor

Checklist

  • My model has a model sheet, report or similar
  • My model has a reference implementation in mteb/models/ this can be as an API. Instruction on how to add a model can be found here
    • No, but there is an existing PR ___
  • The results submitted is obtained using the reference implementation
  • My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
  • I solemnly swear that for all results submitted I have not on the evaluation dataset including training splits. If I have I have disclosed it clearly.

I have filled out the ModelMeta object to the extent possible
I have ensured that my model can be loaded using
mteb.get_model(model_name, revision) and
mteb.get_model_meta(model_name, revision)
I have tested the implementation works on a representative set of tasks.
The model is public, i.e. is available either as an API or the wieght are publicly avaiable to download

@github-actions

github-actions Bot commented Aug 2, 2025

Copy link
Copy Markdown

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: Kingsoft-LLM/QZhou-Embedding
Tasks: AFQMC, ATEC, AmazonCounterfactualClassification, ArXivHierarchicalClusteringP2P, ArXivHierarchicalClusteringS2S, ArguAna, AskUbuntuDupQuestions, BIOSSES, BQ, Banking77Classification, BiorxivClusteringP2P.v2, CLSClusteringP2P, CLSClusteringS2S, CMedQAv1-reranking, CMedQAv2-reranking, CQADupstackGamingRetrieval, CQADupstackUnixRetrieval, ClimateFEVERHardNegatives, CmedqaRetrieval, Cmnli, CovidRetrieval, DuRetrieval, EcomRetrieval, FEVERHardNegatives, FiQA2018, HotpotQAHardNegatives, IFlyTek, ImdbClassification, JDReview, LCQMC, MMarcoReranking, MMarcoRetrieval, MTOPDomainClassification, MassiveIntentClassification, MassiveScenarioClassification, MedicalRetrieval, MedrxivClusteringP2P, MedrxivClusteringS2S, MindSmallReranking, MultilingualSentiment, Ocnli, OnlineShopping, PAWSX, QBQTC, SCIDOCS, SICK-R, STS12, STS13, STS14, STS15, STS17, STS22.v2, STSB, STSBenchmark, SprintDuplicateQuestions, StackExchangeClustering.v2, StackExchangeClusteringP2P.v2, SummEvalSummarization.v2, T2Reranking, T2Retrieval, TNews, TRECCOVID, ThuNewsClusteringP2P, ThuNewsClusteringS2S, Touche2020Retrieval.v3, ToxicConversationsClassification, TweetSentimentExtractionClassification, TwentyNewsgroupsClustering.v2, TwitterSemEval2015, TwitterURLCorpus, VideoRetrieval, Waimai

Results for Kingsoft-LLM/QZhou-Embedding

task_name Kingsoft-LLM/QZhou-Embedding google/gemini-embedding-001 intfloat/multilingual-e5-large Max result
AFQMC 0.72 nan 0.33 0.72
ATEC 0.58 nan 0.4 0.65
AmazonCounterfactualClassification 0.97 0.88 0.7 0.97
ArXivHierarchicalClusteringP2P 0.65 0.65 0.56 0.69
ArXivHierarchicalClusteringS2S 0.63 0.64 0.54 0.65
ArguAna 0.77 0.86 0.54 0.90
AskUbuntuDupQuestions 0.77 0.64 0.59 0.77
BIOSSES 0.97 0.89 0.85 0.97
BQ 0.84 nan 0.48 0.84
Banking77Classification 0.83 0.94 0.75 0.94
BiorxivClusteringP2P.v2 0.70 0.54 0.37 0.70
CLSClusteringP2P 0.78 nan nan 0.82
CLSClusteringS2S 0.76 nan nan 0.76
CMedQAv1-reranking 0.95 nan 0.68 0.95
CMedQAv2-reranking 0.95 nan 0.67 0.95
CQADupstackGamingRetrieval 0.75 0.71 0.59 0.79
CQADupstackUnixRetrieval 0.70 0.54 0.4 0.72
ClimateFEVERHardNegatives 0.52 0.31 0.26 0.52
CmedqaRetrieval 0.52 nan 0.29 0.57
Cmnli 0.94 nan nan 0.94
CovidRetrieval 0.93 0.79 0.76 0.96
DuRetrieval 0.82 nan 0.85 0.94
EcomRetrieval 0.75 nan 0.55 0.78
FEVERHardNegatives 0.83 0.89 0.84 0.95
FiQA2018 0.63 0.62 0.44 0.80
HotpotQAHardNegatives 0.72 0.87 0.71 0.87
IFlyTek 0.72 nan 0.42 0.72
ImdbClassification 0.99 0.95 0.89 0.99
JDReview 0.95 nan 0.81 0.95
LCQMC 0.82 nan 0.76 0.82
MMarcoReranking 0.28 nan 0.29 0.47
MMarcoRetrieval 0.77 nan 0.79 0.90
MTOPDomainClassification 0.99 0.98 0.9 1.00
MassiveIntentClassification 0.74 0.82 0.6 0.92
MassiveScenarioClassification 0.88 0.87 0.7 0.99
MedicalRetrieval 0.73 nan 0.51 0.76
MedrxivClusteringP2P 0.74 nan 0.32 0.74
MedrxivClusteringS2S 0.72 nan 0.3 0.72
MindSmallReranking 0.35 0.33 0.3 0.35
MultilingualSentiment 0.97 nan 0.71 0.97
Ocnli 0.96 nan nan 0.96
OnlineShopping 0.99 nan 0.9 0.99
PAWSX 0.80 nan 0.15 0.80
QBQTC 0.58 nan nan 0.71
SCIDOCS 0.54 0.25 0.17 0.54
SICK-R 0.90 0.83 0.8 0.95
STS12 0.92 0.82 0.8 0.95
STS13 0.96 0.90 0.82 0.98
STS14 0.93 0.85 0.78 0.98
STS15 0.95 0.90 0.89 0.98
STS17 0.91 0.89 0.82 0.93
STS22.v2 0.87 0.72 0.64 0.87
STSB 0.96 0.85 0.82 0.96
STSBenchmark 0.96 0.89 0.87 0.96
SprintDuplicateQuestions 0.99 0.97 0.93 0.99
StackExchangeClustering.v2 0.82 0.92 0.46 0.92
StackExchangeClusteringP2P.v2 0.57 0.51 0.39 0.57
SummEvalSummarization.v2 0.79 0.38 0.31 0.79
T2Reranking 0.68 0.68 0.66 0.73
T2Retrieval 0.79 nan 0.76 0.89
TNews 0.88 nan 0.49 0.88
TRECCOVID 0.76 0.86 0.71 0.95
ThuNewsClusteringP2P 0.91 nan nan 0.91
ThuNewsClusteringS2S 0.88 nan nan 0.88
Touche2020Retrieval.v3 0.55 0.52 0.5 0.75
ToxicConversationsClassification 0.97 0.89 0.66 0.98
TweetSentimentExtractionClassification 0.94 0.70 0.63 0.94
TwentyNewsgroupsClustering.v2 0.91 0.57 0.39 0.91
TwitterSemEval2015 0.92 0.79 0.75 0.92
TwitterURLCorpus 0.93 0.87 0.86 0.96
VideoRetrieval 0.72 nan 0.58 0.84
Waimai 0.98 nan 0.86 0.98
Average 0.80 0.75 0.61 0.84

@KennethEnevoldsen

Copy link
Copy Markdown
Contributor

Hi @PennyYu123, I can neither find this model on Huggingface nor an implementation within the mteb package

@Samoed

Samoed commented Aug 2, 2025

Copy link
Copy Markdown
Member

@KennethEnevoldsen It was added in this PR embeddings-benchmark/mteb#2965, but I can't find it on HF too

@KennethEnevoldsen KennethEnevoldsen added the waiting for review of implementation This PR is waiting for an implementation review before merging the results. label Aug 2, 2025
@KennethEnevoldsen

Copy link
Copy Markdown
Contributor

Given that the model submission was closed, I will also close this. Feel free to re-open if you resubmit the model

@PennyYu123

Copy link
Copy Markdown
Contributor Author

I have corrected the wrong info in model_meta in the latest mteb PR and updated the model implementation code. Our Qingzhou model is now private and will be made public on HF tomorrow.

@KennethEnevoldsen

Copy link
Copy Markdown
Contributor

Ah, that is great. Do feel free to reopen the PR once the implementation has been reviewed

@PennyYu123 PennyYu123 deleted the qzhou-embedding-results branch August 3, 2025 03:01
@PennyYu123 PennyYu123 mentioned this pull request Aug 4, 2025
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

waiting for review of implementation This PR is waiting for an implementation review before merging the results.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants