model: NightOwl-CodeEmbedding by Shun0212 · Pull Request #4791 · embeddings-benchmark/mteb

Shun0212 · 2026-06-10T01:01:20Z

Summary

Adds Shuu12121/NightOwl-CodeEmbedding to the MTEB model registry.

NightOwl-CodeEmbedding is a Sentence Transformers-compatible ModernBERT model specialized for code retrieval. It uses CLS pooling, cosine similarity, and does not require query or document prefixes.

Evaluation

The model was evaluated on 12 representative code-retrieval tasks using MTEB. The macro-average NDCG@10 was 0.70240.

Task	Split	NDCG@10
AppsRetrieval	test	0.36361
COIRCodeSearchNetRetrieval	test	0.84063
CodeEditSearchRetrieval	train	0.74720
CodeFeedbackMT	test	0.76277
CodeFeedbackST	test	0.85137
CodeSearchNetCCRetrieval	test	0.91646
CodeSearchNetRetrieval	test	0.89187
CodeTransOceanContest	test	0.74091
CodeTransOceanDL	test	0.35802
CosQA	test	0.41207
StackOverflowQA	test	0.86031
SyntheticText2SQL	test	0.68354
Macro average		0.70240

There is currently no original paper associated with this model. Detailed benchmark results are available in the model card.

For CodeEditSearch-like training data, I used a custom dataset derived from bigcode/commitpackft. Rows overlapping with cassanof/CodeEditSearch were excluded using content-, diff-, commit-, and repo/commit-based hashes. If CodeEditSearch should still be listed in the model metadata as related training data, I would be happy to add it.

Checklist

I have filled out the ModelMeta object to the extent possible
I have ensured that my model can be loaded using
- mteb.get_model(model_name, revision)
- mteb.get_model_meta(model_name, revision)
I have tested the implementation works on a representative set of tasks
The model is public, i.e., is available either as an API or the weights are publicly available to download
I reproduced results from the original paper (if applicable) on at least one benchmark, and I am including the results in the PR description
- Not applicable: there is currently no original paper associated with this model.

Shun0212 · 2026-06-10T06:07:36Z

Thank you for merging this PR!

Shun0212 and others added 6 commits June 7, 2026 07:26

model: add NightOwl CodeEmbedding metadata

5299670

fix: remove programming language codes from model metadata

c786b35

fix: update memory usage for NightOwl CodeEmbedding model

7de17f6

fix: update NightOwl CodeEmbedding model metadata

2494553

fix: update revision for NightOwl CodeEmbedding model

3ff4539

Merge branch 'embeddings-benchmark:main' into add-nightowl-model

07b9da9

Samoed added the new model Questions related to adding a new model to the benchmark label Jun 10, 2026

Samoed approved these changes Jun 10, 2026

View reviewed changes

Samoed enabled auto-merge (squash) June 10, 2026 05:54

Samoed merged commit b691769 into embeddings-benchmark:main Jun 10, 2026
18 of 19 checks passed

Shun0212 mentioned this pull request Jun 11, 2026

fix: update revision for NightOwl-CodeEmbedding #4799

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

model: NightOwl-CodeEmbedding#4791

model: NightOwl-CodeEmbedding#4791
Samoed merged 6 commits into
embeddings-benchmark:mainfrom
Shun0212:add-nightowl-model

Shun0212 commented Jun 10, 2026

Uh oh!

Uh oh!

Shun0212 commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Shun0212 commented Jun 10, 2026

Summary

Evaluation

Checklist

Uh oh!

Uh oh!

Shun0212 commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants