Skip to content

model: Add video support to qwen3-vl embedding#4699

Merged
KennethEnevoldsen merged 11 commits into
mainfrom
add_video_qwen3
Jun 9, 2026
Merged

model: Add video support to qwen3-vl embedding#4699
KennethEnevoldsen merged 11 commits into
mainfrom
add_video_qwen3

Conversation

@Samoed

@Samoed Samoed commented May 20, 2026

Copy link
Copy Markdown
Member
  1. Created MultimodalInstructSentenceTransformerModel
  2. Changed implementation of Qwen3-VL-Embeddings to native SentenceTransformers (with video support)

I got 0.74249, on Vidore3ComputerScienceRetrieval.v2 (eng subset) in results we have 0.74409. I think new implementation is close enough.

I run BreakfastClassification and got:

{
  "dataset_revision": "59a874899eb241993794a3454c37829727c3b559",
  "task_name": "BreakfastClassification",
  "mteb_version": "2.13.1",
  "scores": {
    "test": [
      {
        "scores_per_experiment": [
          ...
        ],
        "accuracy": 0.471024,
        "f1": 0.448974,
        "f1_weighted": 0.467435,
        "precision": 0.482644,
        "precision_weighted": 0.503809,
        "recall": 0.459089,
        "recall_weighted": 0.471024,
        "ap": NaN,
        "ap_weighted": NaN,
        "main_score": 0.471024,
        "hf_subset": "default",
        "languages": [
          "eng-Latn"
        ],
        "mteb_version": "2.13.1"
      }
    ]
  },
  "evaluation_time": 101.02676701545715,
  "kg_co2_emissions": null,
  "date": 1779312638.304263
}

@Samoed Samoed added the video video extension label May 20, 2026
@Samoed Samoed changed the title Add video qwen3 Add video qwen3-vl embedding May 20, 2026
@Samoed Samoed changed the title Add video qwen3-vl embedding Add video to qwen3-vl embedding May 21, 2026
@Samoed

Samoed commented May 28, 2026

Copy link
Copy Markdown
Member Author

@KennethEnevoldsen Can you review this when you have time?

@KennethEnevoldsen KennethEnevoldsen left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't see this earlier - I think we can avoid the wrapper class (in general I think we could combine most of the ST encoder wrapper into one)

Comment thread mteb/models/instruct_wrapper.py Outdated
return embeddings


class MultimodalInstructSentenceTransformerModel(InstructSentenceTransformerModel):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply integrate this into InstructSentenceTransformerModel, seems like we don't need two classes for this

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged

training_datasets=None,
citation=QWEN3_VL_EMBEDDING_CITATION,
extra_requirements_groups=["qwen-vl"],
extra_requirements_groups=["multimodal-sbert"],

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multimodal-sbert seems a bit funny to me (it is a while ago that people called it sbert). I would use multimodal-sentence-transformer, but it would be breaking so maybe not worth changing.

@Samoed Samoed requested a review from KennethEnevoldsen June 9, 2026 16:12
@KennethEnevoldsen KennethEnevoldsen changed the title Add video to qwen3-vl embedding model: Add video support to qwen3-vl embedding Jun 9, 2026
@KennethEnevoldsen KennethEnevoldsen enabled auto-merge (squash) June 9, 2026 16:12
@KennethEnevoldsen KennethEnevoldsen merged commit beee210 into main Jun 9, 2026
12 of 13 checks passed
@KennethEnevoldsen KennethEnevoldsen deleted the add_video_qwen3 branch June 9, 2026 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

video video extension

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants