model: Add video support to qwen3-vl embedding by Samoed · Pull Request #4699 · embeddings-benchmark/mteb

Samoed · 2026-05-20T21:34:22Z

Created MultimodalInstructSentenceTransformerModel
Changed implementation of Qwen3-VL-Embeddings to native SentenceTransformers (with video support)

I got 0.74249, on Vidore3ComputerScienceRetrieval.v2 (eng subset) in results we have 0.74409. I think new implementation is close enough.

I run BreakfastClassification and got:

{
  "dataset_revision": "59a874899eb241993794a3454c37829727c3b559",
  "task_name": "BreakfastClassification",
  "mteb_version": "2.13.1",
  "scores": {
    "test": [
      {
        "scores_per_experiment": [
          ...
        ],
        "accuracy": 0.471024,
        "f1": 0.448974,
        "f1_weighted": 0.467435,
        "precision": 0.482644,
        "precision_weighted": 0.503809,
        "recall": 0.459089,
        "recall_weighted": 0.471024,
        "ap": NaN,
        "ap_weighted": NaN,
        "main_score": 0.471024,
        "hf_subset": "default",
        "languages": [
          "eng-Latn"
        ],
        "mteb_version": "2.13.1"
      }
    ]
  },
  "evaluation_time": 101.02676701545715,
  "kg_co2_emissions": null,
  "date": 1779312638.304263
}

Samoed · 2026-05-28T09:30:44Z

@KennethEnevoldsen Can you review this when you have time?

KennethEnevoldsen

Sorry I didn't see this earlier - I think we can avoid the wrapper class (in general I think we could combine most of the ST encoder wrapper into one)

KennethEnevoldsen · 2026-06-08T09:39:47Z

        return embeddings
+
+
+class MultimodalInstructSentenceTransformerModel(InstructSentenceTransformerModel):


Why not simply integrate this into InstructSentenceTransformerModel, seems like we don't need two classes for this

KennethEnevoldsen · 2026-06-08T09:41:38Z

    training_datasets=None,
    citation=QWEN3_VL_EMBEDDING_CITATION,
-    extra_requirements_groups=["qwen-vl"],
+    extra_requirements_groups=["multimodal-sbert"],


multimodal-sbert seems a bit funny to me (it is a while ago that people called it sbert). I would use multimodal-sentence-transformer, but it would be breaking so maybe not worth changing.

Samoed added 5 commits May 19, 2026 23:51

add video to qwen3

613ec4c

Merge branch 'main' into add_video_qwen3

4219162

upd implementation

10aa2f3

upd revision

40dccab

fix MultimodalInstructSentenceTransformerModel

298b612

Samoed requested review from AdnanElAssadi56 and KennethEnevoldsen May 20, 2026 21:34

Samoed added the video video extension label May 20, 2026

Samoed changed the title ~~Add video qwen3~~ Add video qwen3-vl embedding May 20, 2026

Samoed changed the title ~~Add video qwen3-vl embedding~~ Add video to qwen3-vl embedding May 21, 2026

Samoed added 2 commits May 21, 2026 12:55

disable double sampling

e6edcc5

move imports inside

ff9a69b

AdnanElAssadi56 approved these changes May 21, 2026

View reviewed changes

KennethEnevoldsen reviewed Jun 8, 2026

View reviewed changes

Samoed added 4 commits June 9, 2026 00:50

Merge branch 'main' into add_video_qwen3

97eaeb3

simpliffy wrapper

0a82871

Merge branch 'main' into add_video_qwen3

976ad31

fix typing

7a1afcd

Samoed requested a review from KennethEnevoldsen June 9, 2026 16:12

KennethEnevoldsen approved these changes Jun 9, 2026

View reviewed changes

KennethEnevoldsen changed the title ~~Add video to qwen3-vl embedding~~ model: Add video support to qwen3-vl embedding Jun 9, 2026

KennethEnevoldsen enabled auto-merge (squash) June 9, 2026 16:12

KennethEnevoldsen merged commit beee210 into main Jun 9, 2026
12 of 13 checks passed

KennethEnevoldsen deleted the add_video_qwen3 branch June 9, 2026 16:23

Samoed mentioned this pull request Jun 14, 2026

Add qwen3 results on vidore3&3.1 embeddings-benchmark/results#567

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

model: Add video support to qwen3-vl embedding#4699

model: Add video support to qwen3-vl embedding#4699
KennethEnevoldsen merged 11 commits into
mainfrom
add_video_qwen3

Samoed commented May 20, 2026 •

edited

Loading

Uh oh!

Samoed commented May 28, 2026 •

edited

Loading

Uh oh!

KennethEnevoldsen left a comment

Uh oh!

KennethEnevoldsen Jun 8, 2026

Uh oh!

Samoed Jun 8, 2026

Uh oh!

KennethEnevoldsen Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		return embeddings


		class MultimodalInstructSentenceTransformerModel(InstructSentenceTransformerModel):

Uh oh!

Conversation

Samoed commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Samoed commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Samoed Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Samoed commented May 20, 2026 •

edited

Loading

Samoed commented May 28, 2026 •

edited

Loading