[Core] More fixes to MultiModalEmbeddings type handling#19715
[Core] More fixes to MultiModalEmbeddings type handling#19715DarkLight1337 merged 5 commits intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Summary of Changes
Hello @russellb, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request is a follow-up to a previous change that modified get_multimodal_embeddings() to always return a MultiModalEmbeddings object, rather than an optional one. My changes update numerous model implementations to align with this new behavior, specifically by refining how the presence of multimodal embeddings is checked, ensuring robustness and consistency across the codebase.
Highlights
- Multimodal Embeddings Type Handling: This PR completes the necessary adjustments for
MultiModalEmbeddingstype handling across various model implementations. It addresses models that were not updated in a previous PR (#19446) due to insufficient CI coverage. - Code Consistency and Readability: The changes standardize conditional checks for
multimodal_embeddingsby replacing explicitis not Noneandis Nonecomparisons with more Pythonic truthiness checks (if multimodal_embeddings:andif not multimodal_embeddings:). This improves code readability and consistency.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Code Review
This PR consistently updates the checks for multimodal_embeddings from is not None / is None to direct boolean evaluation (if multimodal_embeddings / if not multimodal_embeddings). This change is Pythonic and correctly handles the case where an empty collection (like an empty list []) signifies the absence of embeddings, treating it as falsy (or truthy for if not). This aligns with the described change in PR #19446 where get_multimodal_embeddings() no longer returns Optional.
A key point for consideration across all modified files is the type hint for the multimodal_embeddings parameter in the get_input_embeddings (and similar) methods. It's currently Optional[MultiModalEmbeddings] = None (or Optional[NestedTensors]). If this parameter is now guaranteed to be a MultiModalEmbeddings (or NestedTensors) instance and never None (with empty lists/tuples representing no embeddings), the type hint should be updated to remove Optional (and the default None potentially removed or changed) to accurately reflect this contract. This would enhance code clarity and maintainability. I've added specific comments detailing this, which applies broadly to all affected files.
|
PTAL at the failing engine test |
vllm/model_executor/models/blip2.py
Outdated
| ) -> torch.Tensor: | ||
| inputs_embeds = self.language_model.get_input_embeddings(input_ids) | ||
| if multimodal_embeddings is not None: | ||
| if multimodal_embeddings: |
There was a problem hiding this comment.
Isn't this going to potentially cause a
RuntimeError: Boolean value of Tensor with more than one value is ambiguous error?
Since this type is defined as
MultiModalEmbeddings = Union[list[Tensor], Tensor, tuple[Tensor, ...]]
the if clause could potentially test a torch.Tensor directly.
For example, in blip2, get_multimodal_embeddings() calls _process_image_input which in turn calls language_projection.forward which will return a tensor.
There was a problem hiding this comment.
Actually, one of the tests if failing for this reason:
FAILED engine/test_short_mm_context.py::test_context_length_too_short[llava-hf/llava-1.5-7b-hf] - RuntimeError: Boolean value of Tensor with more than one value is ambiguous
There was a problem hiding this comment.
if len(multimodal_embeddings) != 0: should work for lists, tuples and tensors.
There was a problem hiding this comment.
Thanks. I'm updating the checks to use len()
|
The |
There was a problem hiding this comment.
I'm worried about this line in the case when we use the default argument - shouldn't this be like the other lines? (Edited; I had a typo where I copied the "is not None" condition when it should check "is None")
if multimodal_embeddings is None or len(multimodal_embeddings) == 0:
If it should be, maybe it makes sense to make a single utility function to make sure they're all the same (check None and len == 0 in the function), or change the argument to NestedTensors type here if we do not ever want to pass None here.
There was a problem hiding this comment.
thanks - i'll fix these
There was a problem hiding this comment.
done. i fixed all the spots that came up with git grep 'if len(multimodal_embeddings'
There was a problem hiding this comment.
| if len(multimodal_embeddings) == 0: | |
| if not multimodal_embeddings: |
cc @russellb
There was a problem hiding this comment.
@aarnphm , if the multimodal_embeddings is a torch.Tensor you get RuntimeError: Boolean value of Tensor with more than one value is ambiguous if you try to get the truth value directly.
There was a problem hiding this comment.
That's what I used before, but doesn't work if it's a tensor
cc @maxdebayser who pointed this out and submitted the change to use len instead
This is a follow up to PR vllm-project#19446. In that PR, get_multimodal_embeddings() was changed to return `MultiModalEmbeddings` instead of `Optional[MultiModalEmbeddings]` because code in the model runner was requiring that the result was not `None`. Several models needed tweaks to account for this. Many were missed because they were not tested in CI. This should fix the rest of the common changes needed that weren't caught by CI. Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
FIX #19736
This is a follow up to PR #19446.
In that PR, get_multimodal_embeddings() was changed to return
MultiModalEmbeddingsinstead ofOptional[MultiModalEmbeddings]because code in the model runner was requiring that the result was not
None.Several models needed tweaks to account for this. Many were missed
because they were not tested in CI. This should fix the rest of the
common changes needed that weren't caught by CI.
Signed-off-by: Russell Bryant rbryant@redhat.com