[Core] More fixes to MultiModalEmbeddings type handling by russellb · Pull Request #19715 · vllm-project/vllm

russellb · 2025-06-16T21:53:30Z

FIX #19736

This is a follow up to PR #19446.

In that PR, get_multimodal_embeddings() was changed to return
MultiModalEmbeddings instead of Optional[MultiModalEmbeddings]
because code in the model runner was requiring that the result was not
None.

Several models needed tweaks to account for this. Many were missed
because they were not tested in CI. This should fix the rest of the
common changes needed that weren't caught by CI.

Signed-off-by: Russell Bryant rbryant@redhat.com

github-actions · 2025-06-16T21:53:39Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Summary of Changes

Hello @russellb, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request is a follow-up to a previous change that modified get_multimodal_embeddings() to always return a MultiModalEmbeddings object, rather than an optional one. My changes update numerous model implementations to align with this new behavior, specifically by refining how the presence of multimodal embeddings is checked, ensuring robustness and consistency across the codebase.

Highlights

Multimodal Embeddings Type Handling: This PR completes the necessary adjustments for MultiModalEmbeddings type handling across various model implementations. It addresses models that were not updated in a previous PR (#19446) due to insufficient CI coverage.
Code Consistency and Readability: The changes standardize conditional checks for multimodal_embeddings by replacing explicit is not None and is None comparisons with more Pythonic truthiness checks (if multimodal_embeddings: and if not multimodal_embeddings:). This improves code readability and consistency.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This PR consistently updates the checks for multimodal_embeddings from is not None / is None to direct boolean evaluation (if multimodal_embeddings / if not multimodal_embeddings). This change is Pythonic and correctly handles the case where an empty collection (like an empty list []) signifies the absence of embeddings, treating it as falsy (or truthy for if not). This aligns with the described change in PR #19446 where get_multimodal_embeddings() no longer returns Optional.

A key point for consideration across all modified files is the type hint for the multimodal_embeddings parameter in the get_input_embeddings (and similar) methods. It's currently Optional[MultiModalEmbeddings] = None (or Optional[NestedTensors]). If this parameter is now guaranteed to be a MultiModalEmbeddings (or NestedTensors) instance and never None (with empty lists/tuples representing no embeddings), the type hint should be updated to remove Optional (and the default None potentially removed or changed) to accurately reflect this contract. This would enhance code clarity and maintainability. I've added specific comments detailing this, which applies broadly to all affected files.

vllm/model_executor/models/aria.py

vllm/model_executor/models/granite_speech.py

vllm/model_executor/models/qwen2_5_omni_thinker.py

DarkLight1337 · 2025-06-17T05:41:48Z

PTAL at the failing engine test

maxdebayser · 2025-06-17T13:46:31Z

vllm/model_executor/models/blip2.py

    ) -> torch.Tensor:
        inputs_embeds = self.language_model.get_input_embeddings(input_ids)
-        if multimodal_embeddings is not None:
+        if multimodal_embeddings:


Isn't this going to potentially cause a
RuntimeError: Boolean value of Tensor with more than one value is ambiguous error?

Since this type is defined as

MultiModalEmbeddings = Union[list[Tensor], Tensor, tuple[Tensor, ...]]

the if clause could potentially test a torch.Tensor directly.

For example, in blip2, get_multimodal_embeddings() calls _process_image_input which in turn calls language_projection.forward which will return a tensor.

Actually, one of the tests if failing for this reason:

FAILED engine/test_short_mm_context.py::test_context_length_too_short[llava-hf/llava-1.5-7b-hf] - RuntimeError: Boolean value of Tensor with more than one value is ambiguous

if len(multimodal_embeddings) != 0: should work for lists, tuples and tensors.

Thanks. I'm updating the checks to use len()

maxdebayser · 2025-06-17T14:35:19Z

The distributed-tests-2-gpus failure is fixed in this pr: #19734

maxdebayser

Thanks!

govind-ramnarayan · 2025-06-18T04:18:04Z

vllm/model_executor/models/qwen2_5_omni_thinker.py

I'm worried about this line in the case when we use the default argument - shouldn't this be like the other lines? (Edited; I had a typo where I copied the "is not None" condition when it should check "is None")

if multimodal_embeddings is None or len(multimodal_embeddings) == 0:

If it should be, maybe it makes sense to make a single utility function to make sure they're all the same (check None and len == 0 in the function), or change the argument to NestedTensors type here if we do not ever want to pass None here.

thanks - i'll fix these

done. i fixed all the spots that came up with git grep 'if len(multimodal_embeddings'

Looks good!

aarnphm · 2025-06-18T05:33:15Z

vllm/model_executor/models/qwen2_5_omni_thinker.py

Suggested change

if len(multimodal_embeddings) == 0:

if not multimodal_embeddings:

cc @russellb

@aarnphm , if the multimodal_embeddings is a torch.Tensor you get RuntimeError: Boolean value of Tensor with more than one value is ambiguous if you try to get the truth value directly.

That's what I used before, but doesn't work if it's a tensor

cc @maxdebayser who pointed this out and submitted the change to use len instead

vllm/model_executor/models/llava_next.py

This is a follow up to PR vllm-project#19446. In that PR, get_multimodal_embeddings() was changed to return `MultiModalEmbeddings` instead of `Optional[MultiModalEmbeddings]` because code in the model runner was requiring that the result was not `None`. Several models needed tweaks to account for this. Many were missed because they were not tested in CI. This should fix the rest of the common changes needed that weren't caught by CI. Signed-off-by: Russell Bryant <rbryant@redhat.com>

Signed-off-by: Russell Bryant <rbryant@redhat.com>

Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Russell Bryant <rbryant@redhat.com>

Signed-off-by: Russell Bryant <rbryant@redhat.com>

russellb mentioned this pull request Jun 16, 2025

[V1] Change return type on get_multimodal_embeddings() #19446

Merged

mergify bot added the llama Related to Llama models label Jun 16, 2025

gemini-code-assist bot reviewed Jun 16, 2025

View reviewed changes

vllm/model_executor/models/aria.py Outdated Show resolved Hide resolved

vllm/model_executor/models/granite_speech.py Outdated Show resolved Hide resolved

vllm/model_executor/models/qwen2_5_omni_thinker.py Outdated Show resolved Hide resolved

russellb mentioned this pull request Jun 16, 2025

Support embedding models in V1 #16188

Merged

DarkLight1337 approved these changes Jun 17, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) June 17, 2025 03:36

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 17, 2025

maxdebayser suggested changes Jun 17, 2025

View reviewed changes

maxdebayser approved these changes Jun 17, 2025

View reviewed changes

govind-ramnarayan reviewed Jun 18, 2025

View reviewed changes

aarnphm reviewed Jun 18, 2025

View reviewed changes

mergify bot added the qwen Related to Qwen models label Jun 18, 2025

Isotr0py mentioned this pull request Jun 18, 2025

[Bugfix] Fix broken v0 multimodal inference #19814

Closed

4 tasks

Isotr0py reviewed Jun 18, 2025

View reviewed changes

vllm/model_executor/models/llava_next.py Outdated Show resolved Hide resolved

russellb and others added 4 commits June 18, 2025 13:51

Use checks that work across list, tuple, and tensor

89a4f9a

Signed-off-by: Russell Bryant <rbryant@redhat.com>

verify for none and empty sequence

dfa7da2

Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Russell Bryant <rbryant@redhat.com>

fix boolean test

520bb64

Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Russell Bryant <rbryant@redhat.com>

russellb force-pushed the mm-test-fixes branch from f73b82f to 520bb64 Compare June 18, 2025 17:51

Make multimodal_embeddings checks more resilient

23d1192

Signed-off-by: Russell Bryant <rbryant@redhat.com>

DarkLight1337 merged commit 14fdd21 into vllm-project:main Jun 18, 2025
70 checks passed

	if len(multimodal_embeddings) == 0:
	if not multimodal_embeddings:

Uh oh!

Conversation

russellb commented Jun 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jun 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 commented Jun 17, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maxdebayser commented Jun 17, 2025

Uh oh!

maxdebayser left a comment

Choose a reason for hiding this comment

Uh oh!

govind-ramnarayan Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aarnphm Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

russellb commented Jun 16, 2025 •

edited by github-actions bot

Loading

govind-ramnarayan Jun 18, 2025 •

edited

Loading

aarnphm Jun 18, 2025 •

edited

Loading